memcached as simple message queue

Some months ago at work we were in the need of a message queue, a very simple one, basically just a message buffer. The idea is simple, the webservers send there messages to the queue, the queue always accepts all messages and waits until the ETL processes request messages for further processing. As the webservers are time critical and the ETL processes aren’t you need something in between.

We realized that there are not too many solutions out there for this kind of problem. Although we found a fully featured message queue from the Apache Foundation, called ActiveMQ. This one had far too many features for us, but as long as you can use it in a simple way why not give it a try?

Apparently this message queue is production ready, but as we needed to use its REST-interface (due to a PHP environment) it was a horrible experience. The setup wasn’t to hard, but this thing ate memory for breakfast and crashed on high loads. I suppose that thats mainly a problem with the REST-interface, so feel free to correct me here if you have a completely different experience.

An own solution is needed

As our problem was quite simple and we weren’t able to find proper third party solutions we decided to write a message queue ourselves. The idea was to use memcached to keep the messages in memory and be able to access them fast.

As a first test we just implemented a very basic setup in PHP with Apache. This way we could start testing performance without bothering about connections, sockets, multithreading and so on. Our tests satisfied us completely! The queue was fast enough to cope with more than 300 messages per second, a message being less than 2 kilobyte. And that was with all the Apache overhead!

As this all worked that fine and you never have enough time we are still using PHP and Apache. I still would like to see a solution in Perl (yeah, thats my favorite), multithreaded and just listening to sockets without Apache, HTTP and Zend_HTTP (you definitely want to use this instead of curl!) as overhead.

So messages get injected into the queue with a simple POST, errors etc. are handled through HTTP response codes and the messages are extracted with GETs.

Some added reliabilty

At the moment we are running two message queue servers, each having one 3GB (“-m 3000”) memcached locally and one on the other machine (mirrored by software). The Webservers are able to choose the message queue, in case one is full or not available they just choose the other one. The ETL processes request messages from one queue until its empty and then do the same with the other queue. No complex failover logic. This works well for more than half a year and some million messages a day.

Some details

All messages are saved with an integer as key. There is one key that has the next key and one that has the key of the oldest message in the queue. To access these the increment/decrement method is used as its atomic, so there are two keys that act as locks. They get incremented, and if the return value is 1 the process has the lock, otherwise it keeps incrementing. Once the process is finished it sets the value back to 0. Simple but effective. One caveat is that the integer will overflow, so there is some logic in place that sets the used keys to 1 once we are close to that limit. As the increment operation is atomic, the lock is only needed if two or more memcaches are used (for redundancy), to keep those in sync.

The process writes to both memcaches (one local, one on another server), if one write fails it marks itself as failed and doesn’t accept any more messages. There is a “daemon” process running every minute that checks the state of the queue and the memcaches. If theres a problem with one memcache it clears it and tries to copy the content of the other one over and flags the queue as working.

The Apache is stripped down of course, and there is also eAccelerator in place to speed things up on the PHP side. We only use the Opcode Cache feature.

The memcached configuration might be interesting as we always have message that are less than 2 KB, so there is absolutely no need for the different slabs, on the contrary, those slabs just blocked some memory from being reused! Thats why we use a chunk size of 2048 (“-n 2048”) and a chunk factor of 2 (“-f 2”). The latter means that we have only two chunk sizes or two different slabs, one is 2048, the other one 1MB (biggest possible chunk size). We also make memcached return an error if there is not enough memory left as we don’t want to lose messages (“-M”).

Conclusion

Even though the setup is far from perfect its extremely fast and reliable! If implemented “properly” it would make a big performance jump, but we don’t need that at the moment. I read often enough that you shouldn’t use memcached for data that you can’t afford to lose, but I have to say that this solution works just fine, and I don’t see a reason why it shouldn’t be used like this.

Advertisements
memcached as simple message queue

23 thoughts on “memcached as simple message queue

  1. Hmm, I could do that. Though its not detailed enough for that…

    I ignored that part with the disk dumps as it only happened once. I still blame the network. 😉

  2. I agree, the network probably caused the issue.. Large TCP packets , maybe? 😉

    As an overview it’s perfect. Just link to this post from the wiki.

  3. Very useful, you saved me the effort to figure how to implement a queue with the rudimentary tools provided by memcached, clever solution!

    It’s the missing piece of the puzzle for a stack I’m thinking of making for an internal notification system for a website (“you’ve got a new message”-type of notification).

    The setup would be that the PHP scripts putting the concerned messages into the DB would also put a notification into a memcached-based queue. Then a java-based server would consume that queue and send the updates to the site users through a socket. The socket would have been open by flash on the client in the first place (Flash 9 supports binary sockets). And lastly the flash would pass on that data to Javascript for display to the user.

    The end-result being a real-time notification system avoiding hacks such as Comet. And regarding memcached not being persistent, they’re just notifications, so they can be lost without real consequences.

    By the way in a previous job I worked with ActiveMQ and other J2EE queuing systems for a messaging project and they were all horrible to work with and resource-hungry. I agree that such bloated tools should be avoided when one has such simple needs. Looking back, I wish we’d decided to make our own solution from scratch instead of wasting our times with such inefficient queuing systems.

  4. I think i developed exactly the same thing during the past nights. Also with the two integers, using integers as keys… But i didn’t dare to put it into production. Thanks for this post (if i had read this earlier, i would have saved quite some time) and for encouraging me to rely on this solution 🙂

  5. Thinking about this more deeply raises 2 more questions for me:
    Why are you using a http-interface with PHP and apache? Couldn’t the client servers directly insert their jobs into the memcache with a php base class?
    And second: Have you thought about using nanoserv as a environment for the PHP of the jobqueue?

  6. I didn’t know nanoserv, thanks for that, thats basically exactly how the ideal implementation would be, much slimmer and faster than using Apache.
    I used Apache to get a quick pilot running, and it was just not slow enough to spend more time on it by replacing it. 😉
    It’s not ideal, but as it works quite well, there is no real reason to replace it, maybe one day when more performance is required.

    I wanted to split the clients from the queue, thats why the clients cannot insert directly, but have to go through ah interface.

  7. Leandro says:

    Hi, this is a excelent solution, i want implement it. However, i have a question about this:

    “there are two keys that act as locks. They get incremented, and if the return value is 1 the process has the lock, otherwise it keeps incrementing. Once the process is finished it sets the value back to 0.”

    My question is: Is this lock really neccesary? I mean, if client “A” and client “B” attempts to increment the “last element” counter at same time, Memcached guarantee that each client receives a different value (because increment operation is atomic). And then, each client could use the returned value to include it in new element keys, without conflicts.

    Therefore: Must clients obtain a lock before do counter increment?

    Thank you very much!

  8. Hey Leandro,
    thanks for your question, you’re right, the increment operation is atomic, but if you want to use two memcaches (for redundancy) then you need to make sure that you use the same key in both memcaches. Otherwise it could happen that your id’s get out of sync.

    If you want to use only one memcache (no redundancy) you’re totally fine without a lock!

  9. This was a great and very helpful post.

    I’ve tried to provide some pseudo-code for your implementation over on my blog ( hope that i’ve gotten it right :} )– and I’ve tried to expand on the idea a bit by describing a lockless writer and reader.

    ( obviously as you point out to Leandro — lockless versions may not fit everyone’s needs; but in some cases can be appropriate )

  10. I know that this is 3 years old now, but just in case you still using that method, please give 0MQ (zeromq.org) a try. A revolutionary message queuing library, it is far from bloated, very flexible, lots of messaging models out of the box, and very stable.

    I personally did use it in one of my projects, it is hard to grasp at first, but later on you will thank these guys for coming up with this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s