Friday, April 10, 2009

Java Singletons

This post doesn't contain any original research, instead, it describes a problem which I encountered. I believe many programmers have dealt with this issue as well, however, the common solution is wrong.

Singleton is a pretty basic design pattern. Apparently, implementing it is not so basic. The trivial implementation in java is rather straight forward:

public class Singleton {
//DO NOT USE - not thread safe
private static Singleton m_instance = null;
//Making the constructor private enforces the singleton property.
private Singleton() {
//Do something
}
public static Singleton getInstance() {
if(null == m_instance) {
m_instance = new Singleton();
}
return m_instance;
}
}
This is of course not thread safe, as two threads can see null value and then two instances will be created. Again, there is a trivial solution, just make the call to getInstance synchronized:


public class Singleton {
//Works, but not very efficient
private static Singleton m_instance = null;
//Making the constructor private enforces the singleton property.
private Singleton() {
//Do something
}
public static synchronized Singleton getInstance() {
if(null == m_instance) {
m_instance = new Singleton();
}
return m_instance;
}
}

Well, this will work. However, the lock will be acquired whenever getInstance is called, which is bad for performance. This problem is enhanced if many threads are accessing the singleton object, so this is still not good enough. Luckily, there is a famous technique to solve this issue, called the "double checked lock". This technique puts the synchronization itself after checking for null, and then checks again for null in the synchronized block itself, so the lock is not acquired most of the time.


public class Singleton {
//DO NOT USE - Does not always work!
private static Singleton m_instance = null;
//Making the constructor private enforces the singleton property.
private Singleton() {
//Do something
}
public static Singleton getInstance() {
if(null == m_instance) {
createInstance();
}
return m_instance;
}
public static synchronized void createInstance() {
//Double check - in case two threads are calling this function.
if(null == m_instance) {
m_instance = new Singleton();
}
}
}

Until recently, I wrongly believed that this is a good technique. However, it does not work! Sometimes, a thread can see a non null value, but the object itself is not yet constructed. This is when Google comes in handy, and apparently the issue is well explained here. To sum it up, until JDK 5 there wasn't really a good solution for this issue in java (It could be solved in languages that gave the programmer the ability to explicitly create memory barriers, and a solution for Java that is based on using ThreadLocal is also suggested there). From JDK 5, there is a simple solution, which is cited from Jeremy Manson's blog. It could simply be solved by setting the instance's variable to be volatile, which ensures that it will remain null, until after the object is constructed. So this is the final solution, which works from JDK 5 and up:


public class Singleton {
//Works from JDK 5 and up
private static volatile Singleton m_instance = null;
//Making the constructor private enforces the singleton property.
private Singleton() {
//Do something
}
public static Singleton getInstance() {
if(null == m_instance) {
createInstance();
}
return m_instance;
}
public static synchronized void createInstance() {
//Double check - in case two threads are calling this function.
if(null == m_instance) {
m_instance = new Singleton();
}
}
}

Until next time...

Thursday, April 2, 2009

He who could not be named

Until recently, I didn't know that Voldemort existed anywhere outside of Harry Potter books. Well, looking for MySQL alternatives to boost performance can take you to weird places.

A few weeks ago, we encountered high user load on our production machines (which is essentially a good thing). We had to raise more and more machines in order to support more and more requests per second. While raising more machines using the Amazon EC2 infrastructure (to be discussed in a future post) is easy, scaling up MySQL while keeping the data in sync became harder. We figured that we basically need a big hashtable, which is very efficient and easily scalable. At that point, one of my colleagues, Yaron, pointed out this wonderful post by Richard Jones.

After reading the review, I decided to give project-voldemort a try. This is basically an open source, java based, efficient and scalable hash table. After 5 minutes of installation, and a few days of integrating it into the code, I think I'm in love. After much tuning and playing with mysql parameters, the best I could squeeze out of it per request was still not fast enough. With Voldemort, without almost any tuning, the request handling time dropped to less than 1/10th of the time it took with mysql, and I am sure it can be pushed further down after some parameters tuning. That means more users served with less servers, not to mention how easy it is to scale up in case of high load that even Voldemort can't handle.

Unequivocally, using this platform has it's downsides. It's not a relational database, so it's much harder to debug your code, as there is no interface to the data itself. Also, it limits the operations that can be performed in the DB level. It's really nothing more-nothing less than a huge and efficient hash table, so some of the operations must be moved to the code. I also did not find a way to iterate over all of the data, and I'm not sure whether it's possible. However, I think that the amazing performance boost compensates for the shortcomings.

I didn't check the alternatives, but I'll be happy to hear about them in comments.