Do you run MQ with linear logs on Windows? Do you want a log maintenance utility that really works on that platform with modern versions of MQ? So do I.
One of my biggest pet peeves about IBM MQ is that even after 20 years there remains no IBM-supported tool for managing linear log extents or any provision to automatically delete them on Distributed platforms. Recently while on assignment for a customer running MQ v8.0 on Windows we needed just such a tool and went to the SupportPacs landing page to review options. We looked at MO73, MS0L and MS62. None of them met my requirements and in some cases they don’t work at all.
Since I’m not at Interconnect this year I had some time on my hands so I wrote a script of my own. I’m making it available here so it’ll be just like I was there with you in Vegas. No, really, it will. This post takes 45 minutes to read and there’s 15 minutes of Q&A at the end, followed by refreshments in the hall.
Log maintenance requirements
Obviously, any linear log maintenance utility has to delete obsolete log files. In addition to this base requirement, there are a few other things I deemed essential. First, the script needs to run rcdmqimg to advance the log file tail pointers before cleaning the logs. None of the SupportPacs do this. I figure it’s easier to comment that line out of my script if you don’t want to run it than to force you to schedule it separately, so it’s in there.
Also, I wanted a script that didn’t require installation of Perl or have dependencies on VBScript or Java. To be widely applicable across supported versions of both MQ and Windows this log maintenance script is written entirely in Windows batch.
The script must also run without server Administrator privileges or the need to read the registry. Of the three SupportPacs, one requires the directory path to be passed in on the command line and the other two read the registry. The only problem with this is that MQ doesn’t use the registry anymore. Whether the technique can possibly work depends on the version of MQ being used and whether it was migrated from an earlier version.
However, in MQ v7.5 and v8.0 any registry entries are migrated to ini file stanzas so any utility querying the registry to get log path names won’t work with these versions of MQ. Unfortunately the VB Script utility hasn’t been updated since 2006, the Java utility since 2011 and the Perl script since March 2012. All of these pre-date the release of MQ v7.5 I July of 2012. Did everyone write their own script or is nobody running linear logged MQ on Windows since then?
Another issue with reading the registry is that if Windows User Access Control is enabled the account under which log file cleanup is run must have both Windows Admin to read the registry as well as mqm group membership to use runmqsc. From a security perspective no service account should have both of these types of admin privileges. When this is true a compromise of MQ gives the attacker root privileges on the host. I think we should make the attacker work at least a little to get that, don’t you?
The final requirement is the script must properly handle log file wrap-around. If log files with names S9999999.LOG and S0000000.LOG exist when the script is run, the S9999999.LOG must be correctly calculated to be the older of the two if we want the log files to be able to wrap without manual intervention or crashing the queue manager.
This version of log file maintenance for Windows meets all of these requirements.
There are no known issues at this time; however there are some limitations on my ability to test. Sorry, but I don’t have a copy here at IoPT Labs of every version of Windows on which MQ is supported. The script has been tested on MQ v8.0 and Windows 7 64-bit, and with MQ v7.0.1 on Windows XP. I tried to stick with commands that are available on all Windows versions.
Another possible issue is that the log type and path are scraped from the output of amqmdain using positional substring matching. These might break on different versions of MQ depending on whether the output of amdmdain has changed. But it’s the same on v7.0.1 as on v8.0 and both v7.0 and v7.0.1 are end-of-life in September this year so I’m hoping this isn’t an issue.
I would greatly appreciate reports of Windows versions and MQ versions where the script has been tested successfully or a bug report if you find a combination of Windows and MQ on which it breaks.
Running the script
The script doesn’t try to capture its own output. When you schedule it you might want to write a small wrapper script that generates a timestamped log file name and then calls this script with redirection of STDOUT and STDERR to the time-stamped log file. The quick-and-dirty way to do this without a wrapper script is just to append the log file to itself and schedule it like so:
WinMQLog.bat QMNAME >> "F:\Path to \Log Dir\WinMQLog.txt 2>&1"
Many shops have a scheduler but if yours does not Windows provides the schtasks command.
Also, the script doesn’t attempt to raise any alerts on failure. If it finds a problem it exits with return code of 2, if it completes correctly it exits with a 1. You can change these exit values in the script to meet your needs.
I normally leave these features out because there are so many shop-specific requirements as to file locations, credentials, how to deliver alerts, etc. It’s just easier to isolate those in the wrapper script.
If you are ready to try the script, you can download it here. If you want to go through it section-by-section, read on.
The script is organized into the following sections:
- Run rcdmqing
- Fetch the log file numbers from the queue manager
- Process the log files
Let’s look at each of these in turn.
The first two SETLOCAL statements turn on features of the Windows command processor that should be on by default, such as the ability to expand variables within nested loops. Even a slightly complex batch script will need these settings and I always seem to have to enable these. The third SETLOCAL makes sure variables changed in the script do not update those in the parent environment. This is good security hygiene as well as a best practice for Windows batch scripting.
The script validates a great many dependencies. If any are not met, an explanatory message is displayed for the user.
The first of these is to make sure that the script is running as an administrator. It does this by running dspmqtrn which is the command to display pending transactions. If the user is not an administrator the command gives an error stating as much. If the user is an admin, the command returns an error message since it wasn’t passed a queue manager name. The reason for not providing a queue manager name is that the command will never get deep enough into the MQ code to impact any running processes or applications. The idea is to get through the validation without calling anything that will potentially make a change to MQ or the environment. If the validation fails, there should not be anything we need to reset.
Next the script checks the number of parameters passed. A queue manager name is required as the first and only parameter.
Assuming a parameter was passed, it is next used as input to the dspmq command. If the queue manager exists and is running then the validation succeeds. If the validation fails the script doesn’t care why, only that a fatal error has been encountered. The user is left to figure out whether the queue manager name was misspelled or it needs to be started. Generally spelling errors are caught during initial setup so if the script fails it should be obvious why and easy to fix. This kind of error isn’t worth adding complexity to the code in most cases.
Note that the check is made for the string “(Running)” within parentheses. This ensures we don’t get a false positive for queue managers that are running as standby. On the enhancement list for this script is making it detect standby queue managers, log instances when the script finds it’s queue manager is not running locally, and then exit without an error.
The next two validations use the amqmdain command to check the logging style and get the log path. This is particularly important for Windows code since the setting might be in the registry or in the qm.ini file, depending on the MQ version and in some cases whether the queue manager had been migrated from an earlier version of MQ. It is bad enough parsing all the possible places where the ini file might be, and in which ini file the value might be, but having to also check the registry would make the program that much longer and more complex. Given that IBM provides a tool which does all of this for you, it is amazing how many utilities and scripts fail to use it.
Incidentally, it would be worth a whole separate blog post to discuss the notion that amqmdain, or something like it, should be available for all versions of MQ on all platforms. Why force 10,000 different customers to implement logic to query basic MQ configuration details that change from version to version? The product already provides a utility that consistently and accurately finds the information across all possible configurations files and installations, but that utility is only available on Windows. As good as MQ is, with all its optimizations, the one area in which it is consistently deficient is the ability to capture its own configuration details in a restorable format. Perhaps that’s the curse of making software as reliable as MQ – something so basic as configuration recovery tools never become a high priority and can safely be ignored. As long as “safely” is defined in terms of the market demand and not the impact to customers who experience an outage.
It is true that rcdmqimg can cause the queue manager to pause, especially in the case that there are queues with large numbers of persistent messages on them. But it is also true that if the rcdmqimg command doesn’t get run then the log tail pointers do not advance. There is almost no benefit in running log file maintenance without also running rcdmqimg so it seems a bit weird to me that any log file maintenance utility would not run this command. However, that is the case with all of the linear log maintenance SupportPacs. This script assumes you want rcdmqimg to run every time so there is no switch to disable it. You can always keep a version of the script with the rcdmqimg line commented out. It would also be possible to add a parameter to control it, which I’ll happily do if there is sufficient demand.
Fetch the log file numbers from the queue manager
The next two code blocks query the queue manager for the media log and recovery log extents. These values are attributes of the queue manager and available to query using mqsc on versions of MQ that are currently supported. This method is a lot easier than parsing event messages or parsing the error log files. The binary PCF messages are hard to parse with batch file commands and the error logs can possibly roll over before the log extent messages are retrieved. Unlike either of these approaches, the accuracy of fetching the log extent numbers using mqsc doesn’t depend on code complexity and there’s no race condition.
Process the log files
If not for the possibility that log files can wrap, this would be a very short code block. However, accounting for file wrapping requires a bit of custom comparison code. Fortunately, IBM provides the pseudocode for us here. (ibm.co/SuperfluousLogs) IBM uses L, S and M for log extent names and to make it easier for you to follow along, well okay to make it easier for me to follow along, I used the same names.
Note that the script processes all files in the log directory. In a properly running queue manager only log files will be in that folder. However, the script assumes that you might choose to archive logs that aren’t needed for transaction recovery but are still required for media recovery, and that these will have the same file name but with a different extension. If that is the case, they are properly handled. I tested by turning files like S0000000.LOG into S.0000000.ZIP and then watching the script delete them appropriately once their number was up. You can see this in the for loop where any file in the active log directory is assigned to L and then the substring that should contain the numeric value of the extent number is extracted.
I love that the numeric test of a variable value in Windows batch requires four lines of code, one of which is a for loop that splits the value using digits as field delimiters. If there’s anything left after the for loop runs, the value contains non-numeric characters. This is about as bizarre as IBM’s CHLAUTH syntax for UserID blocking that isn’t formulated as allow/deny directives like a web server. I prefer to think of these as delightfully idiosyncratic rather than wickedly user unfriendly, because that’s easier than talking sense to giant corporations.
The rest of this section is a fairly straightforward implementation of IBM’s pseudocode within the constraints of Windows Batch syntax. Files that can be deleted are. Files that should be skipped are. However, files that are eligible for archiving are passed to a subroutine. This isolates the archive code in case you would like to modify it with a command to run 7Zip, WinZip or other compression utility.
The first subroutine is the aforementioned archive code. Currently it unsets the archive bit on the file passed to it. MQ creates the log files with this bit set so unsetting it allows you to run commands against the archive-eligible files by looking at their attributes. Alternatively, you can modify this subroutine to call a compression utility or move the logs off to secondary storage. Just be aware that whatever you do to these files will compete with MQ’s access of the same file partition. Try to schedule file moves or other operations that require reading the file for slow times.
The last bit of code implements IBM’s minlog code. Unfortunately Windows Batch lacks an absolute value function and has some other quirks so the subroutine to calculate the older of two log extents is a bit longer than IBM’s pseudocode.
The first interesting bit is that Windows interprets a numeric value with a leading zero as octal. This isn’t an issue until the first log file with a 9 in it is produced, then all sorts of bad things happen. As with the numeric test, it’s necessary to use a for loop specifying 0 as the delimiter.
The last lime of the subroutine sets a variable to the value of the lowest log extent as calculated by the subroutine. The weird syntax is because the subroutine takes as the 3rd parameter the name of the variable the caller wants the result assigned to. Per IBM’s pseudocode we need values for MINLS, MINSM and MINLSM and it’s easier to just give these variable references to the subroutine than to capture the result and assign it from the calling code block.
Well, that’s it. Please let me know if you use the script. I’m especially interested in reports of platforms and MQ versions where it is confirmed to work or has failed. IF you missed the download link above, here it is again: