Friday, April 24, 2009

I got GSoC 2009!

My proposal for the Google Summer of Code 2009 was accepted! I will be working on Drupal, specifically completing work on the Version Control Integration API with the goal of allowing Drupal to switch from CVS to a DVCS (most likely Git) for its main version control.

It will be a difficult task, but much of the baseline work has already been done, so I will be mostly filling in the gaps in functionality and finishing up the few areas where the Git backend lags behind the CVS one. You can look at my proposal here, and I will endeavor to make relatively frequent status updates on this blog. I am chrono325 on Drupal.org, and will try to spend some time on the #drupal irc channel on freenode, so if you have questions, that's how to get in touch.

Yay for Google!

Sunday, April 19, 2009

SSD Anthology

My dad asked me a few questions about this article. In short, SSD drives suffer from performance degradation due to the fact that they are read from and written to in 4KB pages but can only be erased in 512KB blocks. This means that if you have a full block and want to change a single bit within it, the operating system sends the 4KB page which has been modified, but the SSD needs to erase and rewrite the 512KB block.

This "solution" sounds a lot like reinstalling Windows and all of your programs. ie. way more work and trouble than anyone other than a hard core geek would put up with.


Yup, you're right.

I'm waiting until they come up with an automatic solution. Something like "click 'yes' to speed up your drive"


In that case, you would want the drive to just do it automatically, because why wouldn't you want it to just automatically be as fast as possible? It would be like having a car which had an ignition and a separate button labeled "press this to actually start the car." If it were as simple and transparent as pressing a button, it should be done without bothering you.

As I said, the real problem is that the filesystem doesn't know enough about the structure of the flash drive to cooperate optimally. This is mostly a problem with existing filesystems which are organized for HDDs, but is also a problem of the flash drives which do not expose sufficient information to the filesystems for them to be able to make the best choices.

This is not just a case of one company or the other simply being stupid or lazy (though there is some of that). When flash drives were first introduced, there were no filesystems to take advantage of them since there was no demand for creating such a filesystem, and writing filesystems is REALLY HARD. Really, really hard. This means that you aren't going to create a new filesystem unless you have a really good reason for doing so, since, as I said, writing filesystems is REALLY HARD. The correct thing for the SSD drive manufacturers to do was to hide the underlying complexity from the filesystems of the day and use a strategy which was good enough when a filesystem treated the SSD as a hard drive. This gave rise to the fancy block reordering schemes described in the article.

The problem is that these reordering schemes run into the problems described by the articles (all of the blocks are used up and must be erased before new data can be written) which could be mostly solved (or at least largely mitigated) by filesystems which knew more about the structure of the SSD. Unfortunately, one of the factors over which SSD makers compete is their block reordering scheme, so they have an incentive to keep that a secret and prevent people from circumventing it (thereby making their fancy reordering scheme irrelevant). Taken to the extreme, the only field on which to compete would be the makeup of the memory chips themselves, which would push the different SSD makers to a more commodity status (and therefore lower margins). From a user's view, this would be an ideal scenario as long as your operating system and filesystem could take advantage of the additional information provided by the memory chips.

We are in a sort of transitional period during which there is quite a bit of flux and uncertainty. It is very, very useful to have a small number of filesystems which everyone can read and write, since it makes data portability that much easier and possible. This is the main reason (along with its minimal storage and computational overhead) why the otherwise horribly outdated FAT family of filesystems are still in widespread usage by removable storage. Superior alternatives exist (mostly on Linux, due to the relative ease of writing new filesystems for it), but aren't compatible with other operating systems, and so are useless for consumer flash drives. For a flash-aware filesystem to gain widespread support, it would need to have compatibility with the major operating systems, a catch-22 for any new filesystems.

The other hurdle, which is less visible to end-users, is the way the drives expose information about themselves. For a flash-aware filesystem to be truly effective, it would need some way of gathering information about the characteristics of the underlying flash drive and have direct access to it, bypassing any (or the most dramatic) block reordering schemes. The technical ideal would be to have a single open, royalty-free, well-written and extensible standard for performing this kind of communication. The problem is that allowing such a standard would bypass the competitive differentiator of a manufacturer's reordering scheme, which individual manufacturers would likely resist. If such a standard could be agreed upon, it would go a long way towards enabling better cooperation between the hardware and software.

Whether any of this comes to pass remains to be seen. What is fairly certain, however, is that there will be a lot of volatility in the SSD space as these issues (and more) are figured out.