Friday, February 11, 2011

MineCabinet - a database-backed chunk manager for Minecraft

As I alluded to in my last post, I have been working on modding Minecraft to use the excellent Tokyo Cabinet database to store the chunk data. I have finally gotten it to the point where it is ready to be released to the masses.

Here is a link to the zip file. It contains installation instructions in the README.txt. All of the basic installation information is contained there, so if you're really interested in how it works, hit that download link! :)

Basically, I replaced Minecraft's current ChunkLoader implementation with a new one based on Tokyo Cabinet. Rather than saving each chunk in a separate file, MineCabinet stores everything in a single Tokyo Cabinet database, which reduces the number of files for a 60MB world from 17,000 to 1. It also boosts performance by roughly a factor of 4, according to my basic profiling results. here are a few runs from JVisualVM for vanilla Minecraft, McRegion, and MineCabinet. There are two sets of results, one for creating a new world and running around for a few minutes and then saving and exiting the world, and another for reloading that same world.

First, the profiling for initial world creation:

Vanilla Minecraft - initializing new world. 1030 calls to oj.a(dn, ib) (aka saveChunk) in 4522 ms or 4.4 ms/call

McRegion - initializing new world. 1076 calls to oj.a(dn, ib) in 4060 ms or 3.8 ms/call

MineCabinet - Initializing new world. 996 calls to saveChunk in 728 ms or 0.73 ms/call

And for loading that world back from disk:

Vanilla Minecraft - loading existing world. 570 calls to oj.a(dn, ib) in 2605 ms or 4.5 ms/call

566 calls to oj.a(dn, int, int) (aka loadChunk) in 868 ms or 1.5 ms/call

McRegion - loading existing world. 590 calls to oj.a(dn, ib) in 2424 ms or 4.1 ms/call

581 calls to oj.a(dn, int, int) in 528 ms or 0.91 ms/call

MineCabinet - loading existing world. 562 calls to saveChunk() in 259 ms or 0.46 ms/call

556 calls to loadChunk() in 643 ms or 1.2 ms/call

So as you can see, a very large speedup. Additionally, Tokyo Cabinet has a number of tuning parameters which could affect performance, including switching from using a B+ tree-based database to a hashtable-based database. The hashtable apparently has higher space overhead, but can be more efficient in some circumstances. I didn't really investigate the impact of various tuning parameters on performance, so there is some potential for even more speedups.

One thing to note, though, is that MineCabinet spends a large amount in saveExtraData(). This is because I put a call to db.sync() in there (it was a noop before). It gets called when a world is unloaded, so it seemed like a good time to sync. Although this is a large amount of time, it is only done when switching worlds, so five seconds here isn't as noticeable as a lot of little pauses throughout the game. A more intelligent mod could force a sync (in a dedicated thread) periodically, say after a (configurable?) number of in-game ticks.

Another nice thing about using Tokyo Cabinet is that it obviates the need for managing the "session.lock" file in each world directory. In vanilla Minecraft, this file is checked (twice) for each chunk save to make sure that the directory isn't being written by another Minecraft process. This adds more (though not much) overhead to each chunk save, and can be avoided entirely with Tokyo Cabinet. When opening a database, TC takes a file lock on the file it's opening, so any other process (reader or writer) is blocked from opening that same file. Since a database is opened for the entire session of a world, no other process can modify or read the database while it is in use.

Current limitations

All is not perfect, however. Currently, I only provide a native library for 32-bit Linux, since that is the only build environment I have access to. Building Tokyo Cabinet (and the Java bindings) for Mac OSX shouldn't be hard, and it looks like someone already has a build made.

Windows is harder, since Tokyo Cabinet does not officially support it. It may be possible to build with MinGW, though I've never used it. Tokyo Cabinet doesn't have many dependencies (zlib, bzip2, and pthreads), so it may be possible to get it to build and run on Windows. I'll have to see.

Another possibility is to use the newer and improved-er Kyoto Cabinet, the successor to Tokyo Cabinet which includes (among other things) official binary packages for Windows. The downside is that unlike Tokyo Cabinet, Kyoto Cabinet is licensed under the GPLv3, making it ineligible for use in Minecraft without a commercial license. Mojang may decide that it's worth it to go for the commercial license, but I didn't want to assume that, plus, I wanted to remain closer to the side of legal distribution (Minecraft mods are always a gray area).

Closing remarks

My ultimate hope is that MineCabinet (or something similar using an efficient DB) is included in the main Minecraft client, since I think using an embedded DB would provide a number of advantages over even a well-done custom format (such as McRegion). One of the biggest benefits is simply letting someone else deal with the tricky problem of managing persistence and concurrent access. My opinion is that if you can make your problem someone else's problem, go for it, especially when they specialize in solving that problem. Also, databases like Tokyo/Kyoto Cabinet have effective methods for dealing with both intra- and inter-process concurrent access, which is a notoriously difficult problem to get right.

There's plenty more I could say, but this post is long enough as it is. Please feel free to give feedback and suggestions, especially in the form of code ;)!

P.S. A big thank you to Scaevolus, author of McRegion and Minecraft modding expert. Without his help, I never would have been able to navigate the Minecraft sources and make the modifications to put the whole thing together.