środa, 3 lipca 2013

Trust, but verify - losing messages with ActiveMQ

I created this blog more than half an year ago, but didn't post anything until today. Finally two things coincided - I have some time and I have something I'd like to share. I hope it's not the last of my posts...

Anyway, today I'd like to show that even so well known and relatively mature product as ActiveMQ 5.8.0 can behave in a totally unexpected manner. It's not meant to be ActiveMQ bashing - it is a great product with rich feature set, and you get it for free (or you can pay RedHat for comercially supported JBoss A-MQ). My goal is to show that if certain aspects of the system are of top importance then thorough testing of the whole system is key.

So how can I lose a message with ActiveMQ? Of course we use persistent messages, transactions and so on. One would expect that a queued message stays in the system until it is consumed successfully, expires or is deleted manually, right? No, not really. Try this scenario:

  • Deploy a consumer application that always throws RuntimeException in its onMessage method.

package pl.marekstrejczek.mdb;

import javax.ejb.ActivationConfigProperty;
import javax.ejb.MessageDriven;
import javax.jms.Message;
import javax.jms.MessageListener;

@MessageDriven(
activationConfig = { @ActivationConfigProperty(
propertyName = "destinationType", propertyValue = "javax.jms.Queue"), @ActivationConfigProperty(
propertyName = "destination", propertyValue = "java:/queue/TestQueue")
}, 
mappedName = "TestQueue")
public class RejectAllMDB implements MessageListener {

    public void onMessage(final Message message) {
    throw new RuntimeException("!!! This is intentional !!!");
    }
}
  • Put a message onto the queue using ActiveMQ web console. Of course we want persistent delivery enabled - we don't want to lose the message if anything goes wrong.


  • After our consumer application fails to process the message it lands on dead letter queue (default name: ActiveMQ.DLQ). The messaging broker is smart enough to recognize that the consumer failed to process the message and decides to put the message in a dedicated "quarantine area" - this is what dead letter queue is all about.


  • At this stage in a real environment someone from operations would normally investigate why message processing failed. If the problem was transient and the system is ready to accept the message then the operator will probably want to replay the message to get it finally processed. But what if we move the message from DLQ to the original queue and it fails to process again? Let's try - let's browse ActiveMQ.DLQ in web console and try to move our test message from ActiveMQ.DLQ back to the TestQueue

  • What could we expect? The message is moved to TestQueue, sent to our test application (which hasn't changed and still throws RuntimeException from its onMessage method) and after broker recognizes that the message still cannot be processed successfully it should move the message to dead letter queue again. Message still exists in our system, we're fine. BUT...what really happens is that the message is sent to the application and after processing fails the message DISSAPPEARS. It's no longer on TestQueue or ActiveMQ.DLQ - it's gone, lost. If that message contained confirmation of a financial trade worth $10 mln then you better have good recovery procedures in place.



    For some systems losing a message might not be a big deal. If your system falls into this category - don't worry. But if you need very robust messaging and losing even a single message can have noticeable impact on the business then be careful. I discovered this problem during execution of robustness tests for a technology refresh project I'm currently involved in. We found at least two more middleware issues while executing our tests, so bugs in these products do exist. Third party apps need to be QAed as part of the solution similarly to any applications developed in-house. Unless of course you can afford to go live with a solution that can be buggy in the most sensitive areas.

    By the way - the problem described above will be solved in the upcoming 6.1 version of JBoss A-MQ - see https://fusesource.com/issues/browse/ENTMQ-341. I've also raised ActiveMQ ticket for this issue: https://issues.apache.org/jira/browse/AMQ-4616 (Update 2013-07-13: there has already been an open Active MQ ticket for this problem since 2011: https://issues.apache.org/jira/browse/AMQ-3405. Solved in the upcoming 5.9.0 release).

    Brak komentarzy:

    Prześlij komentarz