Some months ago at work we were in the need of a message queue, a very simple one, basically just a message buffer. The idea is simple, the webservers send there messages to the queue, the queue always accepts all messages and waits until the ETL processes request messages for further processing. As the webservers are time critical and the ETL processes aren’t you need something in between.
We realized that there are not too many solutions out there for this kind of problem. Although we found a fully featured message queue from the Apache Foundation, called ActiveMQ. This one had far too many features for us, but as long as you can use it in a simple way why not give it a try?
Apparently this message queue is production ready, but as we needed to use its REST-interface (due to a PHP environment) it was a horrible experience. The setup wasn’t to hard, but this thing ate memory for breakfast and crashed on high loads. I suppose that thats mainly a problem with the REST-interface, so feel free to correct me here if you have a completely different experience.
An own solution is needed
As our problem was quite simple and we weren’t able to find proper third party solutions we decided to write a message queue ourselves. The idea was to use memcached to keep the messages in memory and be able to access them fast.
As a first test we just implemented a very basic setup in PHP with Apache. This way we could start testing performance without bothering about connections, sockets, multithreading and so on. Our tests satisfied us completely! The queue was fast enough to cope with more than 300 messages per second, a message being less than 2 kilobyte. And that was with all the Apache overhead!
As this all worked that fine and you never have enough time we are still using PHP and Apache. I still would like to see a solution in Perl (yeah, thats my favorite), multithreaded and just listening to sockets without Apache, HTTP and Zend_HTTP (you definitely want to use this instead of curl!) as overhead.
So messages get injected into the queue with a simple POST, errors etc. are handled through HTTP response codes and the messages are extracted with GETs.
Some added reliabilty
At the moment we are running two message queue servers, each having one 3GB (“-m 3000”) memcached locally and one on the other machine (mirrored by software). The Webservers are able to choose the message queue, in case one is full or not available they just choose the other one. The ETL processes request messages from one queue until its empty and then do the same with the other queue. No complex failover logic. This works well for more than half a year and some million messages a day.
All messages are saved with an integer as key. There is one key that has the next key and one that has the key of the oldest message in the queue. To access these the increment/decrement method is used as its atomic, so there are two keys that act as locks. They get incremented, and if the return value is 1 the process has the lock, otherwise it keeps incrementing. Once the process is finished it sets the value back to 0. Simple but effective. One caveat is that the integer will overflow, so there is some logic in place that sets the used keys to 1 once we are close to that limit. As the increment operation is atomic, the lock is only needed if two or more memcaches are used (for redundancy), to keep those in sync.
The process writes to both memcaches (one local, one on another server), if one write fails it marks itself as failed and doesn’t accept any more messages. There is a “daemon” process running every minute that checks the state of the queue and the memcaches. If theres a problem with one memcache it clears it and tries to copy the content of the other one over and flags the queue as working.
The Apache is stripped down of course, and there is also eAccelerator in place to speed things up on the PHP side. We only use the Opcode Cache feature.
The memcached configuration might be interesting as we always have message that are less than 2 KB, so there is absolutely no need for the different slabs, on the contrary, those slabs just blocked some memory from being reused! Thats why we use a chunk size of 2048 (“-n 2048”) and a chunk factor of 2 (“-f 2”). The latter means that we have only two chunk sizes or two different slabs, one is 2048, the other one 1MB (biggest possible chunk size). We also make memcached return an error if there is not enough memory left as we don’t want to lose messages (“-M”).
Even though the setup is far from perfect its extremely fast and reliable! If implemented “properly” it would make a big performance jump, but we don’t need that at the moment. I read often enough that you shouldn’t use memcached for data that you can’t afford to lose, but I have to say that this solution works just fine, and I don’t see a reason why it shouldn’t be used like this.