Driving MQ admin cost, defects to near-zero

The theme of my sessions at this year’s MQTC (and hopefully also at IBM Think if they are accepted) is cloud and virtualization, if you are reading the abstracts.  If you come to the session you find it’s really about designing architecture around configuration management and tools with the specific intent of driving administrative overhead burden and defects down to near zero.  So it was a bit distressing yesterday when during the demo a string of errors cascaded across the screen. Unless you are into schadenfreude, in which case watching my live demo auger into the ground might have been fun for you.  But in the end, the event more proves my point rather than invalidating it.  Here’s why.

I make a point in the demo to tell people that this methodology does not in itself get you to zero defects. The point is to constantly drive down toward zero over time. If you automate a defect you get lots of defects. They are repeatable, consistent, reliable and faithful to the spec, but still defects.  Or in my case during the demo, if you fail to anticipate basic error conditions, sooner or later they sneak up to bit you.  It’s impossible to anticipate everything so getting an error does not condemn the approach, so long as it is addressed and the fix captured in the tool.

The problem I ran into was that I had gutted the Production scripts I used at the last few client engagements, in order to remove the confidential parts and to get the demo to fit into an hour.  Amongst the changes, I removed the complex logic that resolved the queue manager’s home directory regardless of where it was mounted and hard-coded /var/mqm/ followed by the QMgr name.  That worked great until suddenly, crtmqm started generating queue managers whose home directory was QMGRNAME.0001.  Naturally, all the parts of the build script that relied on that began to fail.  Live.  During the demo.  D’oh!

This exact type of failure will happen repeatedly over time but it’s normal and part of the process.  Sometimes it will be because you overlooked something, as in my case.  Other times it will be because the product features or behavior change and the tools need to accommodate that.  Other time it is for no other reason than that people find the most creative ways to break software tools.

But one way or another, the tool will eventually break or defects will be introduced.  Under this methodology the response is to keep iterating fixes into the tool when this happens to constantly push the defect rate lower.  In my case, the script now runs dspmqfls to have MQ resolve file locations so it should never fail for that reason again.

Naturally, that version of the script is posted to S3 so if you are trying out the tooling then the s3sync.ksh script will probably already have pulled the new bootstrap script version down to your local storage without you even realizing it.  If for some reason MQ starts creating versioned QMgr directories you won’t get the failures I had, probably won’t even notice.  And THAT is what the methodology and tools are all about.

If you are looking for the slides, go to the Links page.

This entry was posted in Events, MQTC, WMQ. Bookmark the permalink.

Leave a Reply