Thursday, April 2, 2009

He who could not be named

Until recently, I didn't know that Voldemort existed anywhere outside of Harry Potter books. Well, looking for MySQL alternatives to boost performance can take you to weird places.

A few weeks ago, we encountered high user load on our production machines (which is essentially a good thing). We had to raise more and more machines in order to support more and more requests per second. While raising more machines using the Amazon EC2 infrastructure (to be discussed in a future post) is easy, scaling up MySQL while keeping the data in sync became harder. We figured that we basically need a big hashtable, which is very efficient and easily scalable. At that point, one of my colleagues, Yaron, pointed out this wonderful post by Richard Jones.

After reading the review, I decided to give project-voldemort a try. This is basically an open source, java based, efficient and scalable hash table. After 5 minutes of installation, and a few days of integrating it into the code, I think I'm in love. After much tuning and playing with mysql parameters, the best I could squeeze out of it per request was still not fast enough. With Voldemort, without almost any tuning, the request handling time dropped to less than 1/10th of the time it took with mysql, and I am sure it can be pushed further down after some parameters tuning. That means more users served with less servers, not to mention how easy it is to scale up in case of high load that even Voldemort can't handle.

Unequivocally, using this platform has it's downsides. It's not a relational database, so it's much harder to debug your code, as there is no interface to the data itself. Also, it limits the operations that can be performed in the DB level. It's really nothing more-nothing less than a huge and efficient hash table, so some of the operations must be moved to the code. I also did not find a way to iterate over all of the data, and I'm not sure whether it's possible. However, I think that the amazing performance boost compensates for the shortcomings.

I didn't check the alternatives, but I'll be happy to hear about them in comments.

4 comments:

  1. One fairly obvious option is apache apache hbase.

    ReplyDelete
  2. apache hbase is good for similar purposes, but the latency is too great for ads serving.

    ReplyDelete
  3. What about JBoss Cache (or Infinispan) ? Also Terracota can be an option.

    ReplyDelete
  4. I didn't check them thoroughly vs Voldemort, and they might be a good option too.

    ReplyDelete