A growing number of my clients are deploying IBM MQ on Amazon EC2 instances and a common need that I see emerging is for instrumentation and tooling. When the MQ instance is ephemeral, deploying instances on demand and decommissioning just as suddenly, lots of things the MQ Admin used to do by hand need to be automated. This includes build-time things such as defining objects, run-time tasks like enabling or disabling queues in the cluster, and forensic capabilities such as archiving error logs.
It is this last item that concerned a recent customer. Their main requirement was to ingest MQ error logs in real time, or at least close to it, so those logs would survive death of the virtual host on which they were generated. Getting Splunk to ingest the logs was ridiculously easy. Just define the log files as a Splunk data input and immediately they become available through the Splunk search interface.
That’s all well and good if all you want to do is browse through the logs or search them for particular error codes. To get the benefit of Splunk analytics requires the error logs to be parsed into fields. Then instead of merely searching for error codes you already know about, you can ask Splunk to show you a report of all the error codes sorted by prevalence, by frequency over time, or even which ones are the rare outliers. All the analytic capabilities are usable once the fields are parsed. Better yet, parse logs from many queue managers and now you can spot trends or pick out nodes that are showing early signs of distress. That’s really useful stuff and Splunk provides it right out of the box, but only for log types it knows how to parse. So let’s teach it to parse IBM MQ error logs, shall we?
Continue reading →