Everyone knows that data breaches are expensive. Security venders never tire of telling us exactly how expensive breaches are, on a per-incident or per-record basis. In the case of a large retail brand, a breach can make a significant dent but it usually isn’t fatal. In fact, for one large conglomerate in particular, recent events suggest that even a series of high-profile breaches won’t cripple a company. It might be tempting to assume this experience applies across the board. It might be tempting to assume that your business would fare as well. Don’t make that assumption.
Breaches are so routine, nobody cares
The proportion of the cost of a breach attributable to diminished reputation varies greatly, depending on many factors. Consider, for example, a company running a transactional clearinghouse operation. This is a common business model in Finance and Banking, Insurance, Health Services, Supply Chain, and many others. It is rare to hold a monopoly position in these businesses and the clearinghouse is often one of a few options available to its clients. In many cases, clients contract to multiple clearinghouses to eliminate single points of failure and to arbitrage transaction costs. It is common for a clearinghouse to have relatively small roster of members, numbering in the thousands or hundreds.
When there are so few clients and switching costs are low (or near zero when the client already contracts to a parallel clearing house), reputation is much more important. A single breach would seriously shake up such a company. It would be difficult to survive one or two repeat incidents.
There are many business dimensions that elevate the importance of reputation. The ones that tend to come to mind are those where revenue is directly tied to independent ratings. But often it is the costs that are tied to reputation – audits may become tremendously expensive after a breach. Some businesses differentiate themselves on their advanced security, and suffer if proven wrong.
So when you see the constant stream of breach reports in the news, and notice that the companies involved are seemingly able to just shake it off, resist the temptation to think it will work that way for your company. Consider it in the light of your specific business model, your market, and the expectations of your customers. What are their options? What are their switching costs?
OK, I’m concerned. How does this relate to MQ?
I’m glad you asked. Consider that all of the products in the MQ family are designed to move data in bulk and as quickly as possible. The last thing you want is for a successful attacker to subvert these tools for their own purposes, either by manipulating messages or by using MQ products to exfiltrate data. Fortunately, there are plenty of things the MQ Administrator can do to prevent an attacker from gaining control of MQ or from using it to get data out if they do.
The simplest security control is to make sure that you are using a modern version of MQ. There are two reasons for this. The first is that out-of-support versions do not receive support. Security-relevant APARS are easy to spot because they generally do not have descriptions other than to say “security exposure.” Look over the list of APARS for different versions of MQ and if you see the same security-relevant fix in multiple versions, there’s a good chance it applies to out-of-support versions as well. I can save you the exercise, though. All out-of-support versions have at least one of these.
The better reason to upgrade is that recent versions of MQ have all contained major security enhancements. It’s a long list and includes things like better authentication, improved crypto strength, enhanced diagnostics, faster crypto performance, greater platform and product coverage, and much more. Some of these make the security more effective, some reduce the administrative overhead of configuring and running securely. All of them help preserve your company’s reputation in case of a breach.
The first task for any MQ security project is to lock down administrative access to the queue manager. It is quite common to find a queue manager lots of authorizations set up but lacking the authentication that prevents spoofing administrative IDs. Even more common is that the queue manager allows administration from adjacent queue managers.
These are relatively easy things to exploit but they are also easy to address. At my last engagement we obtained some certificates and then booked an hour meeting to enable TLS between two queue managers, and another hour the following day to enable TLS between MQ Explorer and a queue manager. I purposely walked the team into some erroneous configurations to give them practical experience in diagnosing TLS errors and we still completed both exercises in the allotted time.
Sadly, none of this is in the official curriculum because the MQ Security class was mothballed a few years back. What remains of it is a small module in the Advanced System Admin class. I know because I helped whittle it down to fit, and I wasn’t happy about the amount of things that got cut. Certificate management and MQ TLS can be overwhelming if you don’t know where to start. But an experienced mentor can get you past the steepest part of the learning curve, shaving months off of your on-the-job training.
Say the words “security control” and what usually comes to mind are specific things within the architecture. But the architecture design can itself contribute to security. For example, when I was at the bank we had several MQ interfaces to external business partners, all spread out across different queue managers. Brian Shelden, author of the MS62 SupportPac and my mentor at the time, suggested converting these external MQ interfaces to use a single gateway. I balked because it would mean lots of testing and reconfiguration of firewall rules, both of which involved coordination with outside teams.
It soon became clear that managing security of the external connections in so many different places was going to take more of my admin team’s time than the conversion project I was so worried about. Worse than that, the security I achieved would never provide the isolation and control that the gateway topology does. To top it off, the non-gateway architecture required outbound routes from several Production servers. Normally a Production server denies all outbound connections and then makes exceptions for things like revocation servers and an enumerated list of business partner servers. If you have to manage these specifications with the firewall and network teams, having to do it on only one node makes life easier for all concerned and in the end it’s a lot more secure.
Fortunately for my career (and for Brian’s ego, I’m guessing) Brian got to say “I told you so” and we put the gateway in. That was in the late ’90s and since then I’ve learned a few more things about designing security into the MQ architecture. Many of these are described the Secure Messaging Scenarios with WebSphere MQ book which you can download for free here. But nothing beats talking in person. Look for my sessions at the MQ Technical Conference this September or consider a MQ Architecture Assessment or Health Check if you can’t wait that long.
Security engagements tend to focus on keeping the bad guys out of the network. But with Verizon reporting that Spear Phishing campaigns of as few as 10 emails have a 90% chance of succeeding, you have to assume someone will get through eventually. How would you know if that happened? There is a rich set of instrumentation in MQ that emits event messages, responds to direct inquiry, calls out to external services, and even provides access to intercept MQ’s API calls. All of these can be leveraged to detect intrusion events.
As mentioned above, the more current you are with MQ, the better the security available to you. One of the trends for several years has been to move settings from the qm.ini file into the queue manager as run-time attributes. This means that scripts and instrumentation can inquire on the settings of those attributes and report if they change. In fact, MQ has an entire event category dedicated to reporting when it detects a configuration change.
But intrusion prevention doesn’t stop at the queue manager. MQ produces log files and First Failure Data Capture files (FFDC’s but non-IBMers call these “dump files”) that can be monitored. Monitoring environmental indicators such as free disk and memory is helpful as well.
Though MQ has had some intrusion detection capability all along, most shops haven’t done too much with it. Even a few simple monitors can go a long way. I didn’t even mention all the booby-traps I like to put on an external gateway queue manager to detect intruders. Intrusion detection is a deep topic. Whether you haven’t addressed this, or you have but its been a while, there are probably many opportunities we could identify and things we could implement. Several companies have made the news lately for breaches that went undetected for as long as two years. I’d like to help you avoid that.
If the worst happens and there is a breach of MQ, hopefully you are a client of mine and you made it extremely difficult for the intruder to get in, then you detected the intrusion quickly and shut it down, and now you need to get back up and running. If you do take an outage due to a breach, keeping downtime as brief as possible will minimize damage to the company’s reputation – and yours if you happen to be the MQ administrator.
There are many options available to you both from an architecture and a configuration standpoint. Depending on the requirement there is basic client failover, MQ clusters, hardware clusters, virtualization, shared queues, Active/Passive, Active/Active, disk replication, linear logging, and a few others I could dig up if your requirement is exotic enough.
The problem with recovery in most shops is that when the Recovery Point Objective and Recovery Time Objective are compared to the system as-implemented, the two don’t match. Sometimes they don’t match in such a way that it is doubtful recovery would even work in a live situation but since it is tested no-load that doesn’t show up. I cringe whenever I find a Disaster Recovery plan that is doomed to, well, disaster, but you’d be surprised how often that happens. But better to find out as a result of an assessment (and a live test under heavy load if you like) than during an actual Production outage.
Reputation is an asset
If your server dies you can fix it or buy another one. If the company’s reputation takes a hit, that’s a bit harder to recover from. We tend to think of MQ security as a very functional requirement: keep the bad guys out, keep the information private. But it is useful to remember the business requirement and what it is we are actually protecting. Yes, at some level we are protecting the server, the queue manager, the data, the customer. But ultimately what we are protecting is the reputation of the company, and for many of us that is the single largest revenue-earning asset the company has.
My job is to help you protect that asset. What I bring to the table is the perspective of an outsider. My recommendations apply technology to address the business requirements and “because we’ve always done it that way” doesn’t persuade me.
I also bring experience from many different shops and I’ve seen more ways to break MQ than you’ll probably ever discover. These are potholes and dead-ends you don’t want to find yourself in and what I bring to the table is a map of where they are.
If your company is like every other company out there, the MQ Admin team runs pretty lean. Assuming your people are highly trained you could do your own assessment and implementation, but that means diverting resources from something. Sometimes what you really need is staff augmentation to accommodate the extra work without having to tell your business customer “no” or postpone their project. You can always find a temp who can spell MQ but what I bring to the table is the ability to do a lot of heavy lifting on the project. If you need skills transfer, I bring that too.
I have a number of standard offerings to help with all aspects of designing, securing, operating, and tuning MQ. I also do a lot of troubleshooting, in fact I thrive on high-visibility, high-risk assignments, but there’s nothing “standard” about that. Normally I work as an independent contractor but I also have partnerships with larger firms who can act as the prime contractor and place me as their sub.
All of this, every aspect of my practice, is designed to protect your company’s reputation. It’s the only one you have and I treat it as if it were my own. Let’s talk soon.