-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863
Comments
The default fillpercent of each page is 50%. Increasing this field (i.e. 0.9 or 1.0) should can increase the disk usage. In other words, it should decrease the db file size. But it may hurt the write performance, as it may cause the page to be split on any single K/V insertion. If you have very few write operation, then it makes sense to set a big FillPercent (i.e. 1.0). Also try to compact the db file , the command is
The performance result is aligned with my understanding. Note that bbolt maps the whole db file into memory. When the db file is too big, i.e. far bigger than the physical memory size, then you may encounter frequent page faults. This is one of the areas that we may consider to improve. |
Evenly distributing the data into different buckets may increase the performance a little bit, because it will decrease the hierarchy levels of a B+tree when reading the data. But if you have a small db file (i.e. 20GiB), then there is no difference. |
Also setting a proper larger page size (i.e. 32KB or 64KB) may also improve the performance for the super large db size case (i.e. > 100GB) Refer to #401 (comment) Please feedback if you have any new performance data, thx |
Bolt DB disk usage decrease from 800GB to 440G after compaction |
Setting a larger FillPercent (i.e. 0.9 or 1.0) should can also increase the disk usage (accordingly reduce the file size) Line 43 in 65fcfd2
|
In my use case, my system's data is stored in SST files of Pebble and Leveldb and I plan to use bbolt db to replace them to enhance read performance . My tool read all the data from the Pebble DB and then wrote the same data to Bolt (into the same default bucket). The total amount of data written was approximately 323GB. After completing the write, I compared the disk usage of Pebble and Bolt DB. Pebble DB used 228GB, while Bolt DB used 800GB. This data amplification seems quite unacceptable.
Additionally, I found that in my scenario, Bolt was expected to have better read performance compared to Pebble DB. When the data volume was under 10GB, Bolt did perform better in read tests. When the data volume was under 5GB, Bolt's read latency was under 100 microseconds, which was 50% better than Pebble. However, when the Bolt DB reached 800GB, the read latency increased to over 2 milliseconds (while Pebble remained under 1 millisecond). This dramatic performance drop seems strange, could it be related to storing all the data in a single bucket? Could you provide some suggestions?
The text was updated successfully, but these errors were encountered: