One of the first things anyone learns about async messaging are the fundamental messaging patterns: Datagram, Request/Reply and report. The textbook handling of reply messages calls for the server application to move the message ID to the correlation ID field so that the reply can be associated with the request. One of the available MQGMO.MatchOptions values specifies retreival of messages based on the correlation ID specifically to facilitate request/reply. The idea is that the requesting program puts a message, then performs a GET on the reply queue using the returned message ID to select the reply.
This works great for point-to-point messages in IBM MQ. With MQ’s pub/sub it breaks. In the IBM implementation, as the message is fanned out to n subscribers, MQ generates n new message IDs. Counting the message ID returned to the publisher, that makes 1+n unique message IDs for each publication and totally breaks request/reply.
However, many MQ programmers and admins have assumed that each subscriber should get an exact copy, an instance, of the message that was published, and that the message ID should be preserved across all messages resulting from a publication. When the issue was raised several times with IBM the answer in all cases of which I am aware was “working as designed.”
The result in many instances is programmers resorting to generating their own unique IDs and storing these as message properties, then requiring the server side application to either move them to the correlation ID or preserve them as message properties. This represents a significant amount of unnecessary code complexity, and considerable additional cost to develop and operate pub/sub applications, all to do something most of us assumed would be done by the transport.
When it comes to native MQ, IBM can do what they want, of course. Compliance with the JMS specification is another thing altogether. Since IBM advertises that IBM MQ “supports” the JMS 2.0 spec, the question of whether this behavior is compliant depends on whether it is allowed by the specification, either intentionally or through some unintended ambiguity. When confronted with this issue recently for what seemed like the hundredth time, I decided to finally seek out an authoritative answer. The best place to turn seemed to be the specification team itself, so I contacted their lead, Nigel Deakin and asked point blank: what is the behavior intended by the spec and is it open to interpretation by implementers?
The answer is that the message ID is intended to be preserved such that each subscriber sees the same message ID that the publisher received back from sending the message, and that he does not believe there to be any ambiguity about this in the spec. The Q&A with Nigel is publicly posted if you want to read the exchange between me and Nigel.
So what now? It seems clear to me that the IBM MQ JMS implementation needs to address this. Please see update below. However, I’d argue that this isn’t a candidate for a Request For Enhancement. To make that request through the RFE process is to treat specification non-compliance as an inconvenience rather than an unmet requirement. If I had a support contract, I’d open a PMR. But I do not so it’s up to you to do so if this affects you.
I do not see any wiggle room here and I believe that IBM will address this in the JMS classes eventually. The more interesting question to me is whether they will implement it in the native MQI. The underlying issue that pub/sub as implemented breaks request/reply is not limited to JMS applications. But if the only requirement is to fix it for JMS compliance, then it is possible the native MQI will remain unchanged. This would, in my estimation, be a big mistake. Ideally, Id like to see the underlying MQI fixed and an environment variable or QMgr setting to revert back to the legacy (non-compliant) behavior for those who depend on that behavior.
Please do let me know if this is affecting you and, if you open a PMR, what IBM’s response is. I plan to track this issue and hope to have updates soon.
Update 17 March: Matt White at IBM has responded and among the information he has provided is that IBM’s JMS implementation passes the reference test suite. This means there is a discrepancy between Nigel Deakin’s description of the intent of the spec versus the reference suite and of these the reference tool is least open to interpretation and the thing companies are supposed to be able to rely on. Based on this, Matt has stated that PMRs won’t be accepted and has directed concerned customers to use the RFE process, logging votes and comments against #35062 Provide topic or subscribtion option to duplicate msg id.
For reference we have posted a follow-up to the Stackoverflow question.
That includes information on this issue – and also a request for comments to added to help us provide a messaging system that meets the requirements of customers.
Thanks, Matt!
For those who do not know, calanais is IBM’s rep on the JMS Expert Group. His participation indicates that IBM is taking this seriously, will respond to your comments, and will implement a fix if the community places a high enough priority on them doing so. Matt explains in his Stack Overflow response that the JMS test suite doesn’t catch this. So IBM’s JMS is compliant based on passing the reference test suite and IBM should be able to rely on that. Accordingly, PMRs will not be considered (which seems appropriate in light of this info) so if this affects you, please update RFE 35062 with your comments and votes.