понедельник, 6 февраля 2012 г.

Project Voldemort caveats


Russian post is here

We are using Project Voldemort in our project. If you are also hit some disk space problems as we do -
1. Please check current utilization of files -

# java -jar /usr/local/voldemort/lib/je-4.0.92.jar DbSpace -h /usr/local/voldemort/data/bdb -u

  File    Size (KB)  % Used
--------  ---------  ------
00000000      61439      78
00000001      61439      75
00000002      61439      73
00000003      61439      74
...
000013f6      61415       1
000013fd      61392       2
000013fe      61411       3
00001400      61432       2
00001401      61439       1
...
0000186e      61413     100
0000186f      61376     100
00001870      16875      95
  TOTALS  112583251       7


2. If TOTALS is much less than default (50%) - it seem that you have cache size shortage. Please calculate cache size for your number of records ( using your key and data average size, of course ) -

# java -jar /usr/local/voldemort/lib/je-4.0.92.jar DbCacheSize -records 1000000 -key 100 -data 300 
Inputs: records=1000000 keySize=100 dataSize=300 nodeMax=128 density=80% overhead=10%
    Cache Size      Btree Size  Description
--------------  --------------  -----------
   177,752,177     159,976,960  Minimum, internal nodes only   208,665,600     187,799,040  Maximum, internal nodes only   586,641,066     527,976,960  Minimum, internal nodes and leaf nodes
   617,554,488     555,799,040  Maximum, internal nodes and leaf nodes
Btree levels: 3
and adjust bdb.cleaner.threads accordingly (as maximum of internal nodes).
After this space must be reclaimed (after couple of hours or days - it depends of DB size).
Official and unofficial BDB JE documentation is very helpful.

Morale of the story - even underlying technology must be thoroughly investigated. :)