Windows MQ Log Maintenance that really works

Do you run MQ with linear logs on Windows?  Do you want a log maintenance utility that really works on that platform with modern versions of MQ?  So do I.

One of my biggest pet peeves about IBM MQ is that even after 20 years there remains no IBM-supported tool for managing linear log extents or any provision to automatically delete them on Distributed platforms.  Recently while on assignment for a customer running MQ v8.0 on Windows we needed just such a tool and went to the SupportPacs landing page to review options.  We looked at MO73, MS0L and MS62.  None of them met my requirements and in some cases they don’t work at all.

Since I’m not at Interconnect this year I had some time on my hands so I wrote a script of my own.  I’m making it available here so it’ll be just like I was there with you in Vegas.  No, really, it will.  This post takes 45 minutes to read and there’s 15 minutes of Q&A at the end, followed by refreshments in the hall.

Log maintenance requirements
Obviously, any linear log maintenance utility has to delete obsolete log files.  In addition to this base requirement, there are a few other things I deemed essential.  First, the script needs to run rcdmqimg to advance the log file tail pointers before cleaning the logs.  None of the SupportPacs do this.  I figure it’s easier to comment that line out of my script if you don’t want to run it than to force you to schedule it separately, so it’s in there.
Also, I wanted a script that didn’t require installation of Perl or have dependencies on VBScript or Java.  To be widely applicable across supported versions of both MQ and Windows this log maintenance script is written entirely in Windows batch.

The script must also run without server Administrator privileges or the need to read the registry.  Of the three SupportPacs, one requires the directory path to be passed in on the command line and the other two read the registry.  The only problem with this is that MQ doesn’t use the registry anymore.  Whether the technique can possibly work depends on the version of MQ being used and whether it was migrated from an earlier version.

However, in MQ v7.5 and v8.0 any registry entries are migrated to ini file stanzas so any utility querying the registry to get log path names won’t work with these versions of MQ.  Unfortunately the VB Script utility hasn’t been updated since 2006, the Java utility since 2011 and the Perl script since March 2012.  All of these pre-date the release of MQ v7.5 I July of 2012.  Did everyone write their own script or is nobody running linear logged MQ on Windows since then?

Another issue with reading the registry is that if Windows User Access Control is enabled the account under which log file cleanup is run must have both Windows Admin to read the registry as well as mqm group membership to use runmqsc.  From a security perspective no service account should have both of these types of admin privileges.  When this is true a compromise of MQ gives the attacker root privileges on the host.  I think we should make the attacker work at least a little to get that, don’t you?

The final requirement is the script must properly handle log file wrap-around.  If log files with names S9999999.LOG and S0000000.LOG exist when the script is run, the S9999999.LOG must be correctly calculated to be the older of the two if we want the log files to be able to wrap without manual intervention or crashing the queue manager.
This version of log file maintenance for Windows meets all of these requirements.

Known issues
There are no known issues at this time; however there are some limitations on my ability to test.  Sorry, but I don’t have a copy here at IoPT Labs of every version of Windows on which MQ is supported. The script has been tested on MQ v8.0 and Windows 7 64-bit, and with MQ v7.0.1 on Windows XP.  I tried to stick with commands that are available on all Windows versions.

Another possible issue is that the log type and path are scraped from the output of amqmdain using positional substring matching.  These might break on different versions of MQ depending on whether the output of amdmdain has changed.  But it’s the same on v7.0.1 as on v8.0 and both v7.0 and v7.0.1 are end-of-life in September this year so I’m hoping this isn’t an issue.

I would greatly appreciate reports of Windows versions and MQ versions where the script has been tested successfully or a bug report if you find a combination of Windows and MQ on which it breaks.

Running the script
The script doesn’t try to capture its own output.  When you schedule it you might want to write a small wrapper script that generates a timestamped log file name and then calls this script with redirection of STDOUT and STDERR to the time-stamped log file.  The quick-and-dirty way to do this without a wrapper script is just to append the log file to itself and schedule it like so:

WinMQLog.bat QMNAME >> "F:\Path to \Log Dir\WinMQLog.txt 2>&1"

Many shops have a scheduler but if yours does not Windows provides the schtasks command.

Also, the script doesn’t attempt to raise any alerts on failure.  If it finds a problem it exits with return code of 2, if it completes correctly it exits with a 1.  You can change these exit values in the script to meet your needs.

I normally leave these features out because there are so many shop-specific requirements as to file locations, credentials, how to deliver alerts, etc.  It’s just easier to isolate those in the wrapper script.

If you are ready to try the script, you can download it here.  If you want to go through it section-by-section, read on.

Overview
The script is organized into the following sections:

  • Initialization
  • Validation
  • Run rcdmqing
  • Fetch the log file numbers from the queue manager
  • Process the log files
  • Subroutines

Let’s look at each of these in turn.

Initialization
The first two SETLOCAL statements turn on features of the Windows command processor that should be on by default, such as the ability to expand variables within nested loops.  Even a slightly complex batch script will need these settings and I always seem to have to enable these.  The third SETLOCAL makes sure variables changed in the script do not update those in the parent environment.  This is good security hygiene as well as a best practice for Windows batch scripting.
Validation
The script validates a great many dependencies.  If any are not met, an explanatory message is displayed for the user.

The first of these is to make sure that the script is running as an administrator.  It does this by running dspmqtrn which is the command to display pending transactions.  If the user is not an administrator the command gives an error stating as much.  If the user is an admin, the command returns an error message since it wasn’t passed a queue manager name.  The reason for not providing a queue manager name is that the command will never get deep enough into the MQ code to impact any running processes or applications.  The idea is to get through the validation without calling anything that will potentially make a change to MQ or the environment.  If the validation fails, there should not be anything we need to reset.

Next the script checks the number of parameters passed.  A queue manager name is required as the first and only parameter.

Assuming a parameter was passed, it is next used as input to the dspmq command.  If the queue manager exists and is running then the validation succeeds.  If the validation fails the script doesn’t care why, only that a fatal error has been encountered.  The user is left to figure out whether the queue manager name was misspelled or it needs to be started.  Generally spelling errors are caught during initial setup so if the script fails it should be obvious why and easy to fix.  This kind of error isn’t worth adding complexity to the code in most cases.

Note that the check is made for the string “(Running)” within parentheses.  This ensures we don’t get a false positive for queue managers that are running as standby.  On the enhancement list for this script is making it detect standby queue managers, log instances when the script finds it’s queue manager is not running locally, and then exit without an error.

The next two validations use the amqmdain command to check the logging style and get the log path.  This is particularly important for Windows code since the setting might be in the registry or in the qm.ini file, depending on the MQ version and in some cases whether the queue manager had been migrated from an earlier version of MQ.  It is bad enough parsing all the possible places where the ini file might be, and in which ini file the value might be, but having to also check the registry would make the program that much longer and more complex.  Given that IBM provides a tool which does all of this for you, it is amazing how many utilities and scripts fail to use it.

Incidentally, it would be worth a whole separate blog post to discuss the notion that amqmdain, or something like it, should be available for all versions of MQ on all platforms.  Why force 10,000 different customers to implement logic to query basic MQ configuration details that change from version to version?  The product already provides a utility that consistently and accurately finds the information across all possible configurations files and installations, but that utility is only available on Windows.  As good as MQ is, with all its optimizations, the one area in which it is consistently deficient is the ability to capture its own configuration details in a restorable format.  Perhaps that’s the curse of making software as reliable as MQ – something so basic as configuration recovery tools never become a high priority and can safely be ignored.  As long as “safely” is defined in terms of the market demand and not the impact to customers who experience an outage.

Run rcdmqimg
It is true that rcdmqimg can cause the queue manager to pause, especially in the case that there are queues with large numbers of persistent messages on them.  But it is also true that if the rcdmqimg command doesn’t get run then the log tail pointers do not advance.  There is almost no benefit in running log file maintenance without also running rcdmqimg so it seems a bit weird to me that any log file maintenance utility would not run this command.  However, that is the case with all of the linear log maintenance SupportPacs.  This script assumes you want rcdmqimg to run every time so there is no switch to disable it.  You can always keep a version of the script with the rcdmqimg line commented out.  It would also be possible to add a parameter to control it, which I’ll happily do if there is sufficient demand.

Fetch the log file numbers from the queue manager
The next two code blocks query the queue manager for the media log and recovery log extents.  These values are attributes of the queue manager and available to query using mqsc on versions of MQ that are currently supported.  This method is a lot easier than parsing event messages or parsing the error log files.  The binary PCF messages are hard to parse with batch file commands and the error logs can possibly roll over before the log extent messages are retrieved.  Unlike either of these approaches, the accuracy of fetching the log extent numbers using mqsc doesn’t depend on code complexity and there’s no race condition.

Process the log files
If not for the possibility that log files can wrap, this would be a very short code block.  However, accounting for file wrapping requires a bit of custom comparison code.  Fortunately, IBM provides the pseudocode for us here. (ibm.co/SuperfluousLogs)  IBM uses L, S and M for log extent names and to make it easier for you to follow along, well okay to make it easier for me to follow along, I used the same names.

Note that the script processes all files in the log directory.  In a properly running queue manager only log files will be in that folder.  However, the script assumes that you might choose to archive logs that aren’t needed for transaction recovery but are still required for media recovery, and that these will have the same file name but with a different extension.  If that is the case, they are properly handled.  I tested by turning files like S0000000.LOG into S.0000000.ZIP and then watching the script delete them appropriately once their number was up.  You can see this in the for loop where any file in the active log directory is assigned to L and then the substring that should contain the numeric value of the extent number is extracted.

I love that the numeric test of a variable value in Windows batch requires four lines of code, one of which is a for loop that splits the value using digits as field delimiters.  If there’s anything left after the for loop runs, the value contains non-numeric characters.  This is about as bizarre as IBM’s CHLAUTH syntax for UserID blocking that isn’t formulated as allow/deny directives like a web server.  I prefer to think of these as delightfully idiosyncratic rather than wickedly user unfriendly, because that’s easier than talking sense to giant corporations.

The rest of this section is a fairly straightforward implementation of IBM’s pseudocode within the constraints of Windows Batch syntax.  Files that can be deleted are.  Files that should be skipped are.  However, files that are eligible for archiving are passed to a subroutine.  This isolates the archive code in case you would like to modify it with a command to run 7Zip, WinZip or other compression utility.

Subroutines
The first subroutine is the aforementioned archive code.  Currently it unsets the archive bit on the file passed to it.  MQ creates the log files with this bit set so unsetting it allows you to run commands against the archive-eligible files by looking at their attributes.  Alternatively, you can modify this subroutine to call a compression utility or move the logs off to secondary storage.  Just be aware that whatever you do to these files will compete with MQ’s access of the same file partition.  Try to schedule file moves or other operations that require reading the file for slow times.

The last bit of code implements IBM’s minlog code.  Unfortunately Windows Batch lacks an absolute value function and has some other quirks so the subroutine to calculate the older of two log extents is a bit longer than IBM’s pseudocode.

The first interesting bit is that Windows interprets a numeric value with a leading zero as octal.  This isn’t an issue until the first log file with a 9 in it is produced, then all sorts of bad things happen.  As with the numeric test, it’s necessary to use a for loop specifying 0 as the delimiter.

The last lime of the subroutine sets a variable to the value of the lowest log extent as calculated by the subroutine.  The weird syntax is because the subroutine takes as the 3rd parameter the name of the variable the caller wants the result assigned to.  Per IBM’s pseudocode we need values for MINLS, MINSM and MINLSM and it’s easier to just give these variable references to the subroutine than to capture the result and assign it from the calling code block.

Wrapping up
Well, that’s it.  Please let me know if you use the script.  I’m especially interested in reports of platforms and MQ versions where it is confirmed to work or has failed.  IF you missed the download link above, here it is again:

20150222WinMQLog.ZIP

This entry was posted in IBMMQ and tagged , , , , , , , , . Bookmark the permalink.

20 Responses to Windows MQ Log Maintenance that really works

  1. Rick says:

    I tried to download the script and it is gone. Is there anyway to get the latest script? Thanks

  2. Tommes says:

    Hi, were can I find the script right now?

  3. Jun says:

    Hi there ,

    Can you able to share the script to me . Thanks

  4. Pete says:

    T. Rob,
    I have installed and used your script on a Windows installation (2008 R2 Enterprise) running WMQ 7.5.0.2
    It is very useful and it workd just fine.

  5. Mikael P says:

    Have you removed your script, or did it get lost in server migration?

  6. Krish says:

    Thanks for the Script T.Rob!

  7. Richard Sachsse says:

    Thanks, T.Rob for the script!

    For moving the tail pointer up with the RCDMQIMG command, would it be better to add the “-l” parameter? As stated in the manual page for RCDMQIMG, “-l Writes messages containing the names of the oldest log files required to restart the queue manager and to perform media recovery. The messages are written to the error log and the standard error destination.”

    If I got it right, without “-l” it would take some time until the pointer is updated. Until then, the cleanup may happen, due to a previous tail pointer update, but most probably would be partial.

    • T.Rob says:

      No. Note that the Infocenter also says “When issuing a sequence of rcdmqimg commands, include the -l parameter only on the last command in the sequence, so that the log file information is gathered only once.” This is because the cleanup occurs each time and they want to make sure the user doesn’t gather the log file pointer location at a prior point.

      The problem comes then rcdmqimg is used for anything less than the entire QMgr. In this case it is possible that running the utility will not advance the tail pointer if the data holding the tail pointer is not among the subset captured in the rcdmqimg operation.

      But if you run rcdmqimg without the -l and wait for a checkpoint to occur, you will see the pointers have moved. (Assuming you have enough data queued to overflow an extent or two.)

  8. Pingback: MQGem Monthly (February 2015) | MQGem Software

  9. gtc says:

    Let’s see 20+ years now and no supported Linear Log Maintenance program from IBM. But IBM requires that Linear logs must be used or loss of data may occur. Sounds like a liability avoidance mechanism that can’t lose for IBM. That couldn’t be the real reason now could it ?

    T.Rob weren’t you the IBM MQ Product manager before you left IBM ?

    -GTC

    And Tibor is more on target than not.

    • T.Rob says:

      C’mon, you know better than to suggest the idea of a liability shield in public. If they weren’t doing it before, they sure as hell will now. 😉

      And for the record, I was *a* Product Manager, not *the* Product Manager, the scope was all WebSphere messaging products, and the focus was on security. There is actually a lot of stuff I participated in that got done – features, docs, license terms, pricing. I did go to bat for MQ managing the linear logs but can’t count that among the successes. Sadly, there’s an element of “how bad could this be needed, we went without it for 20 years?” We may need to grab our torches and pitchforks and storm the castle en masse to get it done.

      • gtc says:

        Yes, a smiley face emoticon is needed after that one!

        And, yes you were (and are) an articulate and knowledgeable spokesperson for the IBM MQ product and IBM and the product were well served by your advocacy. But to be frank I was disheartened and a bit concerned about the future viability and vibrancy of the IBM MQ product and not because there was no supported Log maintenance component, but rather, when I heard both you and Paul Clarke had left IBM. While you’re still both vital participants in the IBM MQ community, I couldn’t help but feel your ability to guide and the tools to influence future product direction may well be marginalized to pitchforks and torches. I hope I am wrong.

  10. Malcolm says:

    Thanks T.Rob sorry you couldn’t be in Vegas, I missed out too 🙁

  11. Peter Potkay says:

    Here is the link to the RFE to get IBM to make it so MQ can maintain its own linear logs. Please click and add your vote to this RFE.
    https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=19128

    But in the meantime, thanks to T.Rob for this handy script!

    -Peter

  12. Tibor says:

    Do you know why these support packs were not updated? Everyone hates the linear logging and especially the Windows platform as a server… 🙂

    • T.Rob says:

      At least two of the Linear Log SupportPacs are 3rd party and you’d have to ask the 3rd parties. As for the ones that are internal, IBM pretty much takes a crowdsourcing approach to these. You don’t necessarily get any credit for having written them and in at least a couple cases people (who left, by the way) got dinged on their review because their “external activities” (things that helped customers, benefited the brand and brought prestige to IBM but were not officially funded) cut into their assigned tasks.

      Agreed on Windows. My most recent assignment was on Windows and problems with Active Directory permissions and policy stretched the assignment out to twice what it would have been on any *NIX platform. Half the things we ran into weren’t visible by the MQ admins and required a Domain Admin. My issue with this isn’t that it is complicated, but rather that the degree of complexity almost guarantees it is unsecure. I’ll take Linux over Windows as my MQ/Broker server any day.

Leave a Reply to Peter PotkayCancel reply