I heard of the excellent McRegion mod which used a custom save format to speed up chunk loading. I thought it would be interesting to try my hand at a similar modification, but using one of the real, full-fledged embedded key-value databases I've heard about. After a bit of Wikipedia-ing, I settled on Tokyo Cabinet, as it had Java bindings and seemed speedy and modern.
I'm not yet done making the changes, but I did take a closer look at the size distribution of chunks in Minecraft. To do so, I used the Unix command
du -Sba /path/to/world/to get a listing of the size, in bytes, of each file and folder in the world directory. It looks something like this:
where all of the lines ending with
".dat"are the data files and
./1c/1mare directories. A big giant list of all of the files and directories isn't terribly useful, which leads me to my next step...
All Hail Emacs!
To make things really useful, I pasted all of it into Emacs, removed all the directories from the listing with
M-x keep-lines \.dat$, removed the filenames with the
cua-rectfamily of rectangle-editing commands, wrapped it in an Org-mode table, and used Org-mode's "Babel" feature to generate and embed a gnuplot-generated graph in the file. Out of all of that, figuring out how to use gnuplot was the hardest part; getting it to display a histogram the way I wanted turned out to require a bit of Googling.
But you're not interested in all that. You want to see pretty pictures! Well, without further ado, here is the graph this all produced (grouped intervals of 512 bytes):
So, as you can see, the vast majority of chunks are under 4K, and most are around 2-3K. I did this all to figure out a good starting size to allocate for the byte buffer which is to hold the chunk data before it's written out. Java's
ByteArrayOutputStreamtakes an optional constructor argument to set the initial size of the backing array. It will grow automatically, but it would be nice to avoid a bunch of memory copies.
More on my modification as I near completion.
P.S. Here is the Org-mode file used to generate that graph. It seems to have slightly broken GitHub, though...
EDIT: Fixed the label on the X-axis. If one were to read it like it was written (as opposed to how I imagined that it was written in my sleep-deprived head), you might have thought that the files were 3-4 MB rather than KB. Thanks to Stiltskin on Reddit for pointing this out!