The security maturity progression in MQ starts with access control. First we isolate MQ Admin access, then add granular user and application access. This class of security control is known as intrusion prevention. After mastering that the next phase includes stronger accountability and intrusion detection. These typically include enabling event messages and archiving security-relevant events from the event queues and the error logs.
In order to detect security-relevant events and hold people strictly accountable within the access roles defined, the product itself must be further along that continuum than the customer is. In particular, the intrusion detection controls must report all events and do so accurately and these tests show that to not be true in all cases.
The issues reported in this post affect the ability of an administrator or auditor to accurately detect security-relevant events and enforce accountability. Since error logs roll over, archiving them is subject to a window within which error logs might roll off and be irretrievably lost. Many shops rely on event messages to provide a level of assurance acceptable to auditors and security-conscious users and these errors show that reliance may be misplaced.
Parsing event messages
If you’ve read just about anything I’ve ever written about debugging MQ Security then you know I’m a huge fan of SupportPac MS0P, in large part because it formats PCF event messages into human readable form. This is a project of Mark Taylor’s and I’ve publicly thanked him in conference sessions and blog posts profusely. MS0P drastically reduces the administrative overhead for managing MQ security.
When the event messages in question are Authorization Events, they provide all relevant information needed to diagnose the event. This includes the User ID making the request, the object against which the request was made, and the options used to make the request. Or at least I used to believe they do so reliably, hence my frequent posts advising users to enable Authorization events and install MS0P. Unfortunately, that’s no longer true in all cases. I do not have the full scope of the problem but I’ve found enough issues to warrant concern, and which I’ve documented here.
Testing environment
These issues were discovered while testing CONNAUTH and CHLAUTH as part of the security research I’ve written about recently in this blog. That testing relies heavily on MQ Explorer so these reports are in that context, and tested on MQ v9.0.0.0 running on Red Hat Enterprise Linux 7.2 x86-64. I have not validated them on other platforms so your mileage may vary.
QMgr +dsp wrong permission reported
When connecting via Explorer using an ID that has only +connect +inq on the QMgr, a 2035 is thrown. The event message states that Inquire Queue Manager is not authorized:
Event Type : Queue Mgr Event [44] Reason : Not Authorized [2035] Event created : 2016/11/01 19:22:53.01 GMT Queue Mgr Name : ASH Reason Qualifier : Cmd Not Authorized Command : Inquire Queue Mgr User Identifier : addrmap2
In this case, the addrmap2 ID actually does have +inq on the queue manager but does not have +dsp which is the actual cause of the authorization error. This is correctly reported in the error log:
11/01/2016 03:22:53 PM - Process(29252.8) User(mqm) Program(amqzlaa0)Host(localhost.localdomain) Installation(Installation1) VRMF(9.0.0.0) QMgr(ASH) AMQ8077: Entity 'addrmap2' has insufficient authority to access object 'ASH'. EXPLANATION: The specified entity is not authorized to access the required object. The following requested permissions are unauthorized: dsp ACTION: Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.
The problem here stems from the naming variants used in the different commands. If you have a connection to a Queue Manager and issue Inquire on that connection handle using the API that requires +inq. If you have access to submit PCF commands, the command that requires +dsp is in PCF-speak known as “Inquire Queue Manager” even though if you types it into MQSC it would be DIS QMGR.
So the command reported in the event message is correct but you must know that DIS QMGR in MQSC translates to PCF_INQUIRE_QMGR to know why it is correct. At the same time the permission deficiency reported in the log is correct, even though it doesn’t obviously match the event. Neither of these sources contains the complete set of information.
Queue +dsp wrong everything reported
When connecting via Explorer using an ID that has only +dsp on a subset of queues, many 2035 errors are thrown – two for each queue. Unfortunately, they do not contain the queue name:
Event Type : Queue Mgr Event [44] Reason : Not Authorized [2035] Event created : 2016/11/01 18:55:26.84 GMT Queue Mgr Name : ASH Reason Qualifier : Cmd Not Authorized Command : Inquire Queue Status User Identifier : mqm
Obviously, mqm is authorized to inquire queue status, or anything else for that matter. This is a stock Linux queue manager with OAM enabled and running as mqm. Since this one tells us it’s a queue inquiry problem we expect it to show us the queue name but that is missing.
MQ Explorer issues a DIS Q(*) and a DIS QSTATUS(*) for each refresh of the queues screen, and this causes MQ to spit out one of these for each unauthorized queue for each of the two commands. In my test queue manager with only a couple of user-defined queues that’s more than 100 235 errors for each repaint of the Explorer queues screen.
The AMQERR01.LOG file shows the object name, the correct entity, and identifies +dsp as the missing permission but fails to identify the command that was executed:
11/01/2016 02:55:26 PM - Process(29252.8) User(mqm) Program(amqzlaa0) Host(localhost.localdomain) Installation(Installation1) VRMF(9.0.0.0) QMgr(ASH) AMQ8077: Entity 'addrmap2' has insufficient authority to access object 'SYSTEM.SELECTION.VALIDATION.QUEUE'. EXPLANATION: The specified entity is not authorized to access the required object. The following requested permissions are unauthorized: dsp ACTION: Ensure that the correct level of authority has been set for this entity against the required object, or ensure that the entity is a member of a privileged group.
Resource enumeration weakness
I used Wireshark to sniff the client traffic in order to update this post. The network capture revealed something else of great interest.
Error messages returned to the client are intentionally sparse in order to limit information leaked to an attacker. All the legitimate user is supposed to know is that an authorization error occurred. They don’t get to know things like, for example, the object name if their request had a wildcard. IBM has duly suppressed display of the queue name in the error message returned to the client. Unfortunately, it was also suppressed in the diagnostics emitted by the queue manager.
The interesting bit is that network sniff revealed that an attacker gets back a separate 2035 PCF response for every single queue they are unable to display. The queue name is suppressed in the client error message, but the number of them is returned for the asking. we don’t see this because MQ Explorer silently discards the info, but anyone who can run MQ Explorer could as easily run their own Java code and see the same information.
How might this be abused? Assume for example that your B2B queue manager uses structured names with customer or branch numbers embedded and I know of one called SALES.ORDER.REQ.12345.QA. If I have enough authorization to issue DIS Q(*) then I can also DIS Q(SALES.ORDER.REQ.*) and discover how many other clients access this node, and how often that population increases or decreases, with a fair degree of confidence. Then when I see it’s increased by one I can call the Help Desk, claim to be the new customer and “gosh this is so hard, what’s the name of that new queue again?”
But we both know you don’t really want to wade through millions of spurious 2035 errors. Chances are you either ignore them entirely or else suppress them altogether. That means I’m safe to just enumerate all the queues:
DIS Q(SALES.ORDER.REQ.12345.*) DIS Q(SALES.ORDER.REQ.12346.*) DIS Q(SALES.ORDER.REQ.12347.*) DIS Q(SALES.ORDER.REQ.12348.*) ...
The first one gives me some insight as to whether your naming convention includes a QL or QR for every QA. Then for each of the other queues I compare the number of hits to the number returned in the first one to learn if that naming convention is consistent and how many other customer numbers there are queues for.
Or I can explore the topology of the entire sales system:
DIS Q(SALES.*) DIS Q(SALES.ORDER.*) DIS Q(SALES.ORDER.REQ.*) DIS Q(SALES.ORDER.REQ.12345.*)
Again, comparing the number of errors in the known queue to the ones returned by the other inquiries gives me an idea of the size and complexity of the system.
Or I can dissect the system by known prefix:
DIS Q(*) DIS Q(SYSTEM.*) DIS Q(SYSTEM.FTE.*) DIS Q(SYSTEM.BROKER.*) DIS Q(SALES.*)
The first command gives me total number of queues. The next three give me a breakdown of how many are SYSTEM.* and whether Broker or FTE queues are defined. The last gives me the one topology I already know about – SALES – which I combine with the number of SYSTEM.* queues and subtract from the total to glean the rough number of remaining user-defined queues.
You don’t give up your queue names but knowing the precise numbers of each kind of queue at this level of detail helps convince the Help Desk person that I’m legit.
Performance issues
There’s also the minor issue that sending back all those statuses can impact performance. One of my customers had thousands of queues in a shared hub environment and strict application isolation. When they grant limited access, their developers see only a handful of queues in Explorer.
I’d been assuming they don’t get back errors for all the queues they don’t see. Now I kow that under the covers they get the PCF responses for those authorized queues plus 2x as many 2035 responses as there are non-authorized queues, and that this happens every time Explorer refreshes the queues screen.
Multiply that times all the developers, testers and other staff for 80+ active application development projects and the logs are truly useless and roll over quickly. The impact to Tivoli or other monitoring tool that captures those event messages is a firehose of spurious messages which clog up the archive database.
That’s why I as the attacker am so confident that you are either actively suppressing those errors or, more likely, are ignoring them entirely. Why do I think you aren’t suppressing them? Easy. There’s no option for “please suppress the spurious security relevant events but show us the legitimate ones” because the attackers overwhelmingly ignore RFC 3514 which requires them to set the Evil Bit specifically for this purpose. (Except for Ben, of course.)
Recommendations
Short version: Take those event messages with a grain of salt as they don’t show the whole story and at least in the cases I’ve identified they show the wrong user ID. If you really need to enforce authorization, it might be worth while to look at archiving error logs. I may write some scripts for that here soon due to this problem.
Also, go vote for RFE 86864: Add object name to Not Auth (Type 4) Command Not Authorized. (Thanks, Morag!) That this is a defect and not an enhancement seems obvious but sometimes the RFE is the faster path.
Longer version: I have begun to suspect that IBM lacks a definitive specification for how security in MQ works with regard to AUTHREC, CONNAUTH and CHLAUTH. My testing results matrix show that the authorization decisions have alternated between the process ID and the one that was authenticated by a password, depending on the version and Fix Pack. Similarly, the results reported here show that at the conclusion of CONNAUTH and CHLAUTH there remain two distinct IDs maintained by MQ and different pieces of code with similar function refer to these in different ways.
Here’s a few requirements that MQ doesn’t meet but which one would think would be blindingly obvious:
- The ID that was authenticated MUST be the one that authorizations are evaluated against unless overridden by the MQ Administrator. It should NEVER be the case that the client can override the ID used for authorizations or that the client has more leeway to do so than the MQ administrator, but that’s what we have in all versions that support password-based validation.
- At the conclusion of CONNAUTH and CHLAUTH there should be one and only one ID used for authorization checking. We saw in the last couple of posts that this is not the case and we see in this post that different parts of the MQ code refer to different ID fields from the connection for reporting purposes. Taken together, these weaken accountability and intrusion detection making it impossible for MQ users to confidently move up the security maturity continuum.
- Ideally, the original ID may be retained but only for reporting of context. In the example provided here the process ID was mqm but the MCAUSER was addrmap2. If reporting only one ID it must be the one used for authorization checking, in this case addrmap2. If a second ID is reported it should provide context. In this example mqm could be reported as the Real ID and addrmap2 as the Effective ID (borrowing from the same terms as used by flavors of the UNIX OS). Instead, mqm was reported as the ID that failed the auths check.
- Don’t report spurious security events. If the system supports a function like DIS Q(*) or DIS QS(*) then return only the objects to which the caller authorized and silently skip the rest. If the caller is not authorized to any objects of that type, then return a single error in the log and cut an event message.
- Make sure the events and error logs reported are accurate and reconcile to the test cases that generate them. There should be no case in which the events and error logs differ in count or content, nor in which an individual event is reported more or less than once. In this case the events tell part of the story and the error logs tell another part and you need both to get all the information.
When I lobbied my IBM management to work on the MQ Security team it was specifically to address issues like this in the role of an architect or QA or Dev team lead. Unfortunately, due to geographic and other restrictions the closest I was able to get to that position was product manager and that role had no direct influence over specifications at this level.
If I had that opportunity today the first thing I’d do is lay out a clear and unambiguous specification for interaction of CONNAUTH and CHLAUTH that mapped every single element in the test matrix I’ve posted and then expand that to include real/effective IDs and exit interaction. With the losses from the MQ team over the years of Ian Vanstone, Dale Lane, Morag, Dermot, me, and others who focused deeply on security, that area seems to be suffering. Lets hope IBM gets some more security specialist talent deeply embedded on that team. I’d do it as a consultant to tide them over, or as a permanent hire if I actually got to focus on MQ security this time but however it happens this situation needs to get better soon.
Look for more posts shortly. I have a backlog generated from the recent round of research and testing that I need to clear.
Post updated 20161101 23:17 EDT to include results of network packet capture sniffing that revealed the details of the DIS Q(*) and DIS QSTATUS(*) as well as the resource enumeration issues.
Post updated 20161109 13:10 EDT to include reference to Morag’s RFE and fix some typos and unclear wording.
I raised an RFE a little while back about the last of object name in Not Auth (Type 4) – i.e. command not authorized – events. You may want to add the link to it in the relevant point in this post? https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=86864
Done. Thanks! Now we need an RFE for the flip side of this – that the error log doesn’t show the command. What do you think – update to the existing RFE or a new one altogether? If you raise it I’ll link it here.