Dynamic Logging Reconfiguration in SAS® 9.3

One of the exciting new features in SAS 9.3 (well for me anyway) is the ability to replace or modify the logging configuration on-the-fly for a running server without needing to restart it. Previously in SAS 9.2 you could dynamically change the levels for loggers on servers (via SAS Management Console Server Manager Loggers tabs) but not modify the appender configuration. Now in SAS 9.3 you can completely change the logging configuration file and apply it without restarting the server. This is exciting because it means you can do something like add a new appender for a new custom log file for the metadata server in the middle of the day with out having to restart the entire platform of dependant services. That’s really neat!

I tried it out. I made a small change to the metadata server logconfig.xml, saved it, and then waited … watching the log file. I’d thought that maybe the servers had a watchdog thread to monitor changes to the log config file (a bit like the log4j’s configureAndWatch) but after watching and waiting for a little while nothing happened. I then checked the documentation (always a good idea) and found that you actually need to prod the server to tell it to change its logging config via proc iomoperate (also new in SAS 9.3).

This is the code I used to tell the SAS Metadata Server to reload its logging configuration file (logconfig.xml) – the one I had changed.

proc iomoperate;
connect host='sas93lnx01' port=8561 user='sasadm@saspw' password='thesecretpassword';
set log cfg="/opt/ebiedieg/Lev1/SASMeta/MetadataServer/logconfig.xml";
quit;

It worked. The metadata server immediately closed it’s current log file (SASMeta_MetadataServer_2012-01-10_sas93lnx01_1266.log) and opened a new one with a _0 suffix (SASMeta_MetadataServer_2012-01-10_sas93lnx01_1266.log_0) using the modified config. I wondered why it had added the _0 suffix? Subsequent executions increase the log file suffix (_1, _2, etc.). Perhaps it was done to delineate the old and new logging config in case someone wanted to parse the logs? If anyone knows how to make it continue to log to the same log file I’d love to know.

Of course this dynamic logging reconfiguration also means you can readily switch between normal and trace level logging config files to troubleshoot a problem and then switch back again quickly – all without a restart. Something like this:

* switch to trace log configuration;
proc iomoperate;
connect host='sas93lnx01' port=8561 user='sasadm@saspw' password='thesecretpassword';
set log cfg="/opt/ebiedieg/Lev1/SASMeta/MetadataServer/logconfig.trace.xml";
quit;

* >>> REPLICATE THE PROBLEM;
* switch back to normal log configuration;
proc iomoperate;
connect host='sas93lnx01' port=8561 user='sasadm@saspw' password='thesecretpassword';
set log cfg="/opt/ebiedieg/Lev1/SASMeta/MetadataServer/logconfig.xml";
quit;

For more info check out the following SAS documentation references:

Troubleshooting: sas.servers & hostname aliases

SAS® Service Management Script for Linux and UNIX: sas.servers I ran into an interesting problem over the last few days with my SAS® 9.3 deployments on Linux. I had noticed that the sas.servers scripts for my Lev2 and Lev3 deployments were taking several minutes to start the SAS services. I know JBoss takes ages to start but usually the SAS services start really quickly. I had assumed Remote Services was the culprit, but surprisingly it turned out to be starting the SAS/CONNECT Job Spawner service that was the problem. Even more surprisingly the underlying cause turned out to be the use of the hostname in the log filename where the deployment logical hostname alias used during deployment was different to the physical hostname of the machine it was deployed on.

To start the troubleshooting process I modified the sas.servers script to add some timing information to the console log messages it generated. Editing the sas.servers script is not necessarily the best option but more on that later. The sas.servers script has a function called logmsg which seems to have been very nicely provided in order to customize its logging. I wanted to make a simple change to add a time stamp so I could see where all the time was being spent. I changed the function from this (reformatted and comments omitted for brevity):

logmsg() {
    echo "$*"
}

… to this:

logmsg() {
    DT=`date +%Y-%m-%d:%H:%M:%S`
    echo "$DT: $*"
}

I stopped the services since they were already running:

root@sas93lnx01:~# cd /opt/ebiedieg/Lev3
root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers stop
2011-10-30:21:51:31: Stopping SAS servers

Then checked they were all stopped:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers status
2011-10-30:21:51:59: SAS servers status:
2011-10-30:21:51:59: SAS Metadata Server 1 is NOT up
2011-10-30:21:51:59: SAS OLAP Server 1 is NOT up
2011-10-30:21:51:59: SAS Object Spawner 1 is NOT up
2011-10-30:21:51:59: SAS Share Server 1 is NOT up
2011-10-30:21:51:59: SAS CONNECT Spawner 1 is NOT up
2011-10-30:21:51:59: SAS Remote Services 1 is NOT up
2011-10-30:21:51:59: SAS Framework Data Server 1 is NOT up

Once stopped, I started them again so I could see where all the time was spent:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers start
2011-10-30:21:54:54: Starting SAS servers
2011-10-30:21:55:03: SAS Metadata Server 1 is UP
2011-10-30:21:55:07: SAS OLAP Server 1 is UP
2011-10-30:21:55:11: SAS Object Spawner 1 is UP
2011-10-30:21:55:11: SAS Share Server 1 is UP
2011-10-30:21:56:12: SAS CONNECT Spawner 1 is NOT up
2011-10-30:21:56:28: SAS Remote Services 1 is UP
2011-10-30:21:56:32: SAS Framework Data Server 1 is UP

Total time to start was 98 seconds. Not too bad really but it should be quicker than that. Everything seemed to start within a few seconds of each other except for the the SAS/CONNECT Job Spawner! It looked like I was wrong to assume it was Remote Services. The SAS/CONNECT Job Spawner appeared to be taking 60 seconds to apparently not start. It had started though. This was confirmed by looking in the process list and the job spawner log file. It all seemed very odd and for 2 reasons: firstly the job spawner is usually very fast to start and secondly 60 seconds is a bit of a convenient number; it sounded like a timeout.

The sas.servers script starts each SAS service in turn waiting for the service to become available before starting the next one. It sounded like sas.servers was waiting for some event from the job spawner and gave up after 60 seconds assuming it had not started. Time to dig into the sas.servers script a bit more.

The SAS/CONNECT job spawner is started by the sas.servers script in the start_connect_spawner function. It spawns the spawner (ConnectSpawner.sh start) in the background and then calls the is_atypical_server_up function to wait for it to finish starting. Looking into the is_atypical_server_up function it seems to repeatedly sleep and grep the job spawner log for some trigger text. Effectively:

grep "SAS Job Spawner for Open Systems" /opt/ebiedieg/Lev3/ConnectSpawner/Logs/ConnectSpawner_sas93lnx01.log

Now I could see the problem. It was scanning a non-existant log file ConnectSpawner_sas93lnx01.log when it should be scanning the ConnectSpawner_sas93lnx03.log file which was actually being written to by the job spawner. The log file name contained the wrong hostname. The name sas93lnx01 was the physical name of the machine, but sas93lnx03 was the host name alias I used when deploying the Lev3 environment. I prefer to use host name aliases in deployments for the benefits they provide in being able to easily move deployments between physical machines and provide disaster recovery options. In this case the Lev3 environment was currently on the same machine as the Lev1 environment, but I knew one day I would move it onto another machine so had used a host name alias sas93lnx03 (a DNS CNAME record) which I could easily redirect later to the appropriate physical machine name (DNS A record).

There was the problem. The SAS/CONNECT job spawner was using the deployment/logical hostname alias whereas the sas.servers script was using the physical hostname (which it got from `uname -n`). With the wrong log file name it could never find the log file and therefore never see the trigger text and would always time out after 60 seconds.

There were at least two ways I could fix this problem. The first was to modify the ConnectSpawner.sh script to use the physical host name, rather than deployment host name alias, in the log file name. The second was to modify the sas.servers script to use the deployment hostname alias instead of the physical host name. That was the option I chose. It was the slightly more difficult of the two but I didn’t feel right telling the job spawner to log to a file with the wrong name :)

These were the changes I made to sas.servers to get it to work. Firstly I added an extra line where the SHOSTNAME environment variable is derived, changing this:

SHOSTNAME=`uname -n`

… to this:

SHOSTNAME=`uname -n`
SHOSTNAME_ALIAS="sas93lnx03"

Secondly I modified the start_connect_spawner function to change the name of the log file (as checked by the is_atypical_server_up function) from this:

SLOGNAME="ConnectSpawner_${SHOSTNAME}"

… to this:

SLOGNAME="ConnectSpawner_${SHOSTNAME_ALIAS}"

As it happens, the same problem occurs with the deployment tester server component too so I also edited the start_deployment_testsrv function changing this:

SLOGNAME="DeploymentTesterServer_${SHOSTNAME}"

… to this:

SLOGNAME="DeploymentTesterServer_${SHOSTNAME_ALIAS}"

However, there is a potential problem with this method of editing sas.servers directly; I alluded to it earlier in the post. The sas.servers script is a generated script and any edits made to it will get wiped out if/when it is regenerated at a later date. The SAS documentation for the sas.servers script cautions against editing the script. The sas.servers script is created by the generate_boot_scripts.sh script. Next time someone or something runs generate_boot_scripts.sh the changes to sas.servers will be lost.

To protect against this you could modify the templates that are used by generate_boot_scripts.sh. This is what I did. I made all the same changes described above in the single template /opt/ebiedieg/Lev3/Utilities/script_templates/sas.servers.mainlog so they would survive a regeneration. I wouldn’t necessarily recommend this though. It’s not a method documented by SAS Institute and I suspect these templates will most likely get changed at some point during a SAS software upgrade or hotfix. However, it works for me for the time being. Looking back it probably would have been better to just changed the ConnectSpawner.sh script to use the wrong hostname ;)

With those changes done it was time to stop and start the servers to see any improvements:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers start
2011-10-30:23:54:30: Starting SAS servers
2011-10-30:23:54:39: SAS Metadata Server 1 is UP
2011-10-30:23:54:43: SAS OLAP Server 1 is UP
2011-10-30:23:54:47: SAS Object Spawner 1 is UP
2011-10-30:23:54:47: SAS Share Server 1 is UP
2011-10-30:23:54:49: SAS CONNECT Spawner 1 is UP
2011-10-30:23:55:06: SAS Remote Services 1 is UP
2011-10-30:23:55:10: SAS Framework Data Server 1 is UP

This time only 40 seconds (less than half the original time) and the SAS CONNECT Job Spawner is now reported as being up.

As it turns out, I noticed that several of the other SAS services (like the metadata server) use the physical host name in their log filenames. It all works fine but I would have preferred they used the deployment hostname if it were possible. I briefly looked into ways of telling those servers to use the logical hostname alias instead but, since they use the SAS logging framework and the %S{hostname} conversion pattern, I couldn’t see any obvious way other than editing all of the config files and scripts post deployment. If someone knows a good way of consistently and automatically using the deployment hostname (the one providing during deployment and potentially different from the physical hostname) during installation then I’m all ears :)

Which logger did that SAS log message come from?

Occasionally you might want to generate a custom SAS log file, perhaps to be parsed and analysed to generate a custom report. You want to focus on a specific subset of messages and already know that you can modify the logging configuration file and attach the logger that generates the message to an appender to write the messages to your custom log file. The only thing you don’t know is which logger in particular is generating the messages you want to capture. You can see the messages you want already in the main log file but don’t know which logger is generating them. Is it App.Meta, Audit.Meta.Security, Audit.Authentication or something else? How do you find out which logger you need?

In situations like this I temporarily modify the logging configuration file (usually logconfig.xml) to add the conversion character %c to the ConversionPattern setting. This adds the logger name into the message. Restart the server in question and then look for the source of the interesting messages. Normally I only do this on a development server so I might leave the change in place so I can easily find the right logger next time.

Here’s an example of the ConversionPattern param in the SAS metadata server logging configuration file before the change:

<param name="ConversionPattern" value="%d %-5p [%t] %X{Client.ID}:%u - %m"/>

… and here’s the same param after adding in a “(%c)”:

<param name="ConversionPattern" value="%d %-5p [%t] (%c) %X{Client.ID}:%u - %m"/>

… and finally here’s a SAS metadata server log file fragment showing the result of the change:

2011-05-19T16:34:31,524 INFO  [00000004] (Admin.Operations) :sas - SAH011001I SAS (9.2) Metadata Server (8562), State, starting
2011-05-19T16:34:31,754 INFO  [00000008] (App.Meta) :sas - OMACONFIG option OMA.SASSEC_LOCAL_PW_SAVE found with value 1 and processed.
2011-05-19T16:34:31,755 INFO  [00000008] (App.Meta) :sas - Unable to load AES encryption algorithm. Using SASPROPRIETARY for password storage.
2011-05-19T16:34:31,755 INFO  [00000008] (App.Meta) :sas - Using SASPROPRIETARY for password fetch.
2011-05-19T16:34:31,760 INFO  [00000008] (Audit.Meta.Security) :sas - SAS Metadata Authorization Facility Initialization.
2011-05-19T16:34:31,761 INFO  [00000008] (Audit.Meta.Security) :sas - SAS is an adminUser.
2011-05-19T16:34:31,769 INFO  [00000008] (Audit.Meta.Security) :sas - SASTRUST@SASPW is a trustedUser.
2011-05-19T16:34:31,771 INFO  [00000008] (Audit.Meta.Security) :sas - SASADM@SASPW is an unrestricted adminUser.
...
2011-05-19T16:35:42,227 INFO  [00000164] (Audit.Authentication) :sasadm@saspw - New client connection (8) accepted from server port 8562 for user sasadm@saspw.  Encryption level is Credentials using encryption algorithm SASPROPRIETARY.  Peer IP address and port are [127.0.0.1]:52947.

Now I can see where the messages are coming from. Perhaps the Audit.Authentication message is the one I want.

You can find out more about conversion characters and configuring custom logging by reading the document SAS® 9.2 Logging Configuration and Programming Reference available from support.sas.com (PDF, HTML).

Update 20May2011: I forgot to mention that if you are generating a custom log file for the purposes of audit or performance logging, you might want to look first at the SAS 9.2 Enterprise Business Intelligence Audit and Performance Measurement instrumentation package. It might already provide what you need and if not then you might be able to make some small customisations rather than start from scratch.

Tracking Down the SAS Management Console Log File

Recently I have been updating the documentation for our Metacoda Security Plug-ins relating to their use with SAS® 9.2. In our documentation we mention the SAS Management Console log file as a potential source of support information if help is needed. The SAS Management Console log file can provide clues such as error messages and stack traces. Our software writes information messages to the log file too. It’s a good starting point for troubleshooting.

With SAS 9.1 the location of the SAS Management Console log file was easy to remember. With SAS 9.2 it has moved a little deeper into the file system and I haven’t yet committed its location to memory … which I’m sure will happen soon … perhaps even by the end of this post :).

As I was updating our documentation I was hoping that I could add a reference to a location in the official SAS documentation where people could find out more about the SAS Management Console log file. I couldn’t find any specific information about the log file itself but did see a few references to it in other sections. In some cases the documentation I did find differed a little from my own experiences and so in the rest of this post I will refer to the SAS documentation and also mention where it might differ from my own experience with the software.

The default location for the SAS Management Console log file depends on both the SAS software version you are using as well as the operating system you are running it on.

SAS Management Console 9.1

In my use of SAS Management Console 9.1 I have found that it writes its log to a file named errorLog_SAS_Management_Console.txt in the user’s home directory by default. Potential log file locations for someone with a userid of paul might include:

Windows®
C:\Documents and Settings\paul\errorLog_SAS_Management_Console.txt
UNIX
/home/paul/errorLog_SAS_Management_Console.txt

This differs from SAS® 9.1.3 Management Console User’s Guide, Chapter 2 Setting up SAS Management Console in the section Using the Error Log which has:

This file is named errorlog.txt, and is located by default in the SAS Management Console directory.

SAS Management Console 9.2

With SAS Management Console 9.2 the log file has been renamed to SASMCErrorLog.txt and has relocated a few subdirectories down in the user’s home directory.

I found a SAS documentation reference to the log file location in the SAS® 9.2 Intelligence Platform: System Administration Guide in a section named Troubleshooting a Replication Job. It mentions the following potential locations for SASMCErrorLog.txt:

  • On Windows Vista systems: C:\Users\user\AppData\Roaming\SAS\SASManagementConsole\9.2
  • On other Windows systems: C:\Documents and Settings\user\Application Data\SAS\SASManagementConsole\9.2
  • On UNIX systems: user’s-home-directory/SAS/SASManagementConsole/9.2

The default location and layout of the Windows user profile changed with Windows Vista which explains why the SAS Management Console log file location is different from Windows Vista onwards. Having installed SAS Management Console 9.2 in Windows 7 I have seen that this new log file location applies to Windows 7 too.

The UNIX location as specified in the SAS documentation reference above differs from my own experience with a Linux based installation of SAS Management Console 9.2. On Linux I found SASMCErrorLog.txt was in the location user’s-home-directory/.SASAppData/SASManagementConsole/9.2/

Combining my own experiences with information from the SAS documentation, potential locations for the SAS Management Console 9.2 log file for someone with a userid of paul might include:

Windows 7, Windows Vista
C:\Users\paul\AppData\Roaming\SAS\SASManagementConsole\9.2\SASMCErrorLog.txt
Windows (versions prior to Windows Vista)
C:\Documents and Settings\paul\Application Data\SAS\SASManagementConsole\9.2\SASMCErrorLog.txt
UNIX, Linux
/home/paul/.SASAppData/SASManagementConsole/9.2/SASMCErrorLog.txt

Of course these paths will need to be adjusted if your system administrators have configured operating system installations at your site to store home directories / user profiles in alternative locations.

I would be interested to find out if anyone has found their SAS Management Console log file in any locations other than these.