SAS & JBoss: Too Many Open Files

I’ve been seeing some ‘Too many open files‘ exceptions in the SAS® mid-tier JBoss logs on my Ubuntu Linux server. I was surprised about this because I remember during installation I had followed the guidance in the SAS documentation and increased the nofile limit. It turns out that verifying the ulimit from a normal console/ssh login was not sufficient, I should have also verified it from an su based login too.

These were the sorts of messages I was seeing in the JBoss logs:

2012-03-11 09:22:38,463 ERROR [org.apache.tomcat.util.net.JIoEndpoint] Socket accept failed
java.net.SocketException: Too many open files
...
2012-03-11 08:57:19,454 ERROR [org.apache.catalina.core.StandardContext] Error reading tld listeners java.io.FileNotFoundException: /opt/jboss-4.2.3.GA/server/SASServer3/work/jboss.web/localhost/SASBIDashboard/tldCache.ser (Too many open files)
...

There are some Pre-Installation Steps for JBoss for both SAS 9.3 and SAS 9.2 that should be followed during installation to avoid these errors. These were the instructions I had followed, but as you’ll see in a moment, it wasn’t quite enough for this (unsupported) Ubuntu installation.

The instructions specify to edit the /etc/security/limits.conf file directly. Rather than editing this main config file, where the settings might get forgotten or lost during an upgrade, I placed the settings I required for my SAS installation in their own dedicated config file: /etc/security/limits.d/sas.conf

# Increase the open file descriptors limit from the default of 1024 to 30720 for JBoss running web apps for SAS 9.2/9.3
* - nofile 30720

I knew that this had taken effect because I logged in, via ssh, to verify it as the sas user (I run JBoss as the sas user). Checking the ulimit I saw the following:

sas@server:~$ ulimit -Hn;ulimit -Sn
30720
30720

With nofile at 30270, how was it I was still getting ‘Too many open files‘ errors? After a quick session on Google I found a blog post suggesting the increased limits will only apply if the pam_limits PAM module is enabled.

Checking the /etc/pam.d/login file I could see the pam_limits line was already present and uncommented:
...
session required pam_limits.so
...

This made sense since the console/ssh login showed the expected numbers.

Google also led me to a stackoverflow question (how do i set hard and soft file limits for a non-root user at boot?). The answer provided there indicated that, for su commands, you also need to verify the pam_limits module is enabled in an additional su specific PAM config file, which on my machine was /etc/pam.d/su. My JBoss init script runs as root during system startup but uses su to run JBoss as the sas user. Looking in /etc/pam.d/su I could see that the pam_limits line was commented so perhaps that was the issue.

Before making the necessary changes, I verified the nofile ulimit for the sas user by running su as root:

root@server:~# su sas --login --command 'ulimit -Hn;ulimit -Sn'
1024
1024

Aha! It had the 1024 default rather then the increased value. It looked like this was indeed the problem. I uncommented the pam_limits line in /etc/pam.d/su and repeated the test:

root@server:~# su sas --login --command 'ulimit -Hn;ulimit -Sn'
30720
30720

It now shows the increased value as expected, so it looks like the problem’s fixed. I restarted JBoss and haven’t seen any ‘Too many open files‘ errors since.

Troubleshooting: sas.servers & hostname aliases

SAS® Service Management Script for Linux and UNIX: sas.servers I ran into an interesting problem over the last few days with my SAS® 9.3 deployments on Linux. I had noticed that the sas.servers scripts for my Lev2 and Lev3 deployments were taking several minutes to start the SAS services. I know JBoss takes ages to start but usually the SAS services start really quickly. I had assumed Remote Services was the culprit, but surprisingly it turned out to be starting the SAS/CONNECT Job Spawner service that was the problem. Even more surprisingly the underlying cause turned out to be the use of the hostname in the log filename where the deployment logical hostname alias used during deployment was different to the physical hostname of the machine it was deployed on.

To start the troubleshooting process I modified the sas.servers script to add some timing information to the console log messages it generated. Editing the sas.servers script is not necessarily the best option but more on that later. The sas.servers script has a function called logmsg which seems to have been very nicely provided in order to customize its logging. I wanted to make a simple change to add a time stamp so I could see where all the time was being spent. I changed the function from this (reformatted and comments omitted for brevity):

logmsg() {
    echo "$*"
}

… to this:

logmsg() {
    DT=`date +%Y-%m-%d:%H:%M:%S`
    echo "$DT: $*"
}

I stopped the services since they were already running:

root@sas93lnx01:~# cd /opt/ebiedieg/Lev3
root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers stop
2011-10-30:21:51:31: Stopping SAS servers

Then checked they were all stopped:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers status
2011-10-30:21:51:59: SAS servers status:
2011-10-30:21:51:59: SAS Metadata Server 1 is NOT up
2011-10-30:21:51:59: SAS OLAP Server 1 is NOT up
2011-10-30:21:51:59: SAS Object Spawner 1 is NOT up
2011-10-30:21:51:59: SAS Share Server 1 is NOT up
2011-10-30:21:51:59: SAS CONNECT Spawner 1 is NOT up
2011-10-30:21:51:59: SAS Remote Services 1 is NOT up
2011-10-30:21:51:59: SAS Framework Data Server 1 is NOT up

Once stopped, I started them again so I could see where all the time was spent:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers start
2011-10-30:21:54:54: Starting SAS servers
2011-10-30:21:55:03: SAS Metadata Server 1 is UP
2011-10-30:21:55:07: SAS OLAP Server 1 is UP
2011-10-30:21:55:11: SAS Object Spawner 1 is UP
2011-10-30:21:55:11: SAS Share Server 1 is UP
2011-10-30:21:56:12: SAS CONNECT Spawner 1 is NOT up
2011-10-30:21:56:28: SAS Remote Services 1 is UP
2011-10-30:21:56:32: SAS Framework Data Server 1 is UP

Total time to start was 98 seconds. Not too bad really but it should be quicker than that. Everything seemed to start within a few seconds of each other except for the the SAS/CONNECT Job Spawner! It looked like I was wrong to assume it was Remote Services. The SAS/CONNECT Job Spawner appeared to be taking 60 seconds to apparently not start. It had started though. This was confirmed by looking in the process list and the job spawner log file. It all seemed very odd and for 2 reasons: firstly the job spawner is usually very fast to start and secondly 60 seconds is a bit of a convenient number; it sounded like a timeout.

The sas.servers script starts each SAS service in turn waiting for the service to become available before starting the next one. It sounded like sas.servers was waiting for some event from the job spawner and gave up after 60 seconds assuming it had not started. Time to dig into the sas.servers script a bit more.

The SAS/CONNECT job spawner is started by the sas.servers script in the start_connect_spawner function. It spawns the spawner (ConnectSpawner.sh start) in the background and then calls the is_atypical_server_up function to wait for it to finish starting. Looking into the is_atypical_server_up function it seems to repeatedly sleep and grep the job spawner log for some trigger text. Effectively:

grep "SAS Job Spawner for Open Systems" /opt/ebiedieg/Lev3/ConnectSpawner/Logs/ConnectSpawner_sas93lnx01.log

Now I could see the problem. It was scanning a non-existant log file ConnectSpawner_sas93lnx01.log when it should be scanning the ConnectSpawner_sas93lnx03.log file which was actually being written to by the job spawner. The log file name contained the wrong hostname. The name sas93lnx01 was the physical name of the machine, but sas93lnx03 was the host name alias I used when deploying the Lev3 environment. I prefer to use host name aliases in deployments for the benefits they provide in being able to easily move deployments between physical machines and provide disaster recovery options. In this case the Lev3 environment was currently on the same machine as the Lev1 environment, but I knew one day I would move it onto another machine so had used a host name alias sas93lnx03 (a DNS CNAME record) which I could easily redirect later to the appropriate physical machine name (DNS A record).

There was the problem. The SAS/CONNECT job spawner was using the deployment/logical hostname alias whereas the sas.servers script was using the physical hostname (which it got from `uname -n`). With the wrong log file name it could never find the log file and therefore never see the trigger text and would always time out after 60 seconds.

There were at least two ways I could fix this problem. The first was to modify the ConnectSpawner.sh script to use the physical host name, rather than deployment host name alias, in the log file name. The second was to modify the sas.servers script to use the deployment hostname alias instead of the physical host name. That was the option I chose. It was the slightly more difficult of the two but I didn’t feel right telling the job spawner to log to a file with the wrong name :)

These were the changes I made to sas.servers to get it to work. Firstly I added an extra line where the SHOSTNAME environment variable is derived, changing this:

SHOSTNAME=`uname -n`

… to this:

SHOSTNAME=`uname -n`
SHOSTNAME_ALIAS="sas93lnx03"

Secondly I modified the start_connect_spawner function to change the name of the log file (as checked by the is_atypical_server_up function) from this:

SLOGNAME="ConnectSpawner_${SHOSTNAME}"

… to this:

SLOGNAME="ConnectSpawner_${SHOSTNAME_ALIAS}"

As it happens, the same problem occurs with the deployment tester server component too so I also edited the start_deployment_testsrv function changing this:

SLOGNAME="DeploymentTesterServer_${SHOSTNAME}"

… to this:

SLOGNAME="DeploymentTesterServer_${SHOSTNAME_ALIAS}"

However, there is a potential problem with this method of editing sas.servers directly; I alluded to it earlier in the post. The sas.servers script is a generated script and any edits made to it will get wiped out if/when it is regenerated at a later date. The SAS documentation for the sas.servers script cautions against editing the script. The sas.servers script is created by the generate_boot_scripts.sh script. Next time someone or something runs generate_boot_scripts.sh the changes to sas.servers will be lost.

To protect against this you could modify the templates that are used by generate_boot_scripts.sh. This is what I did. I made all the same changes described above in the single template /opt/ebiedieg/Lev3/Utilities/script_templates/sas.servers.mainlog so they would survive a regeneration. I wouldn’t necessarily recommend this though. It’s not a method documented by SAS Institute and I suspect these templates will most likely get changed at some point during a SAS software upgrade or hotfix. However, it works for me for the time being. Looking back it probably would have been better to just changed the ConnectSpawner.sh script to use the wrong hostname ;)

With those changes done it was time to stop and start the servers to see any improvements:

root@sas93lnx01:/opt/ebiedieg/Lev3# ./sas.servers start
2011-10-30:23:54:30: Starting SAS servers
2011-10-30:23:54:39: SAS Metadata Server 1 is UP
2011-10-30:23:54:43: SAS OLAP Server 1 is UP
2011-10-30:23:54:47: SAS Object Spawner 1 is UP
2011-10-30:23:54:47: SAS Share Server 1 is UP
2011-10-30:23:54:49: SAS CONNECT Spawner 1 is UP
2011-10-30:23:55:06: SAS Remote Services 1 is UP
2011-10-30:23:55:10: SAS Framework Data Server 1 is UP

This time only 40 seconds (less than half the original time) and the SAS CONNECT Job Spawner is now reported as being up.

As it turns out, I noticed that several of the other SAS services (like the metadata server) use the physical host name in their log filenames. It all works fine but I would have preferred they used the deployment hostname if it were possible. I briefly looked into ways of telling those servers to use the logical hostname alias instead but, since they use the SAS logging framework and the %S{hostname} conversion pattern, I couldn’t see any obvious way other than editing all of the config files and scripts post deployment. If someone knows a good way of consistently and automatically using the deployment hostname (the one providing during deployment and potentially different from the physical hostname) during installation then I’m all ears :)

SAS Restricted Options on UNIX

In a tweet by Gordon Cox last month, I was reminded of the restricted options facility available with SAS® software on UNIX platforms. This is capability where an administrator can set mandatory SAS system options at multiple levels of granularity: globally, per-group, and/or per-user. The reason for this post is that I don’t look at the documentation for this very often and every time I do it takes me a while to track it down. I always think its going to be in the UNIX companion in the Base SAS area… but it’s not! That gets me every time. Instead it’s tucked away in the Configuration Guide for SAS 9.2 Foundation for UNIX Environments (PDF) in Chapter 2 – Restricted Options. You can find this document in the Install Center section of support.sas.com under SAS Installation Note 36467: Documentation for a SAS® 9.2 installation on UNIX.

The context of the tweet was that the restricted options facility is another mechanism whereby a default setting of the NOXCMD option for SAS platform servers could be overridden for a subset of trusted users or groups in a SAS platform installation. The NOXCMD option is discussed in an earlier post: NOXCMD: NO eXternal CoMmanDs!

A quick summary of restricted options:

  • SAS Systems Options under UNIX set by an administrator, that cannot be changed by a user
  • Processed in the order global, group, then user. The last instance of an option is the one that wins.
  • Global restrictions are read from the file !SASROOT/misc/rstropts/rsasv9.cfg
  • Group restrictions are read from the file !SASROOT/misc/rstropts/groups/<groupname>_rsasv9.cfg
  • User restrictions are read from the file !SASROOT/misc/rstropts/users/<userid>_rsasv9.cfg

On Linux (at least) I can use the command “id -gn <userid>” to find out the effective group name for a user, given their user id. For example, “id -gn sassrv” might generate “sas“.

In my SAS 9.2 installation on Linux, whilst everyone else is still constrained by the NOXCMD option, I can ensure that the SAS Enterprise Guide user Bob Baxter, who has a user id of bob, can still use operating system commands in the SAS programs he runs on the SASApp server, by creating the file /usr/local/SAS/SASFoundation/9.2/misc/rstropts/users/bob_rsasv9.cfg with the following contents:

-xcmd

Of course, this only applies to SAS processes launched and run as the requesting user. Whilst it can be used to override NOXCMD for specific users/groups using a standard workspace server, it cant be used to distinguish between different users on the same stored process server, since all users will share SAS stored process server processes running under a shared identity (like sassrv). In that situation directing the users to separate SAS application servers would be more appropriate. There is an example of this in Jim Fenton & Robert Ladd’s SAS Global Forum 2010 paper 311-2010: A Practical Approach to Securing a SAS® 9.2 Intelligence Platform Deployment

Thanks to Gordon for reminding me about the restricted options facility.

Desktop Files for Launching SAS Apps on Ubuntu

Whilst I often use the command line on Linux, it’s also nice to have icons in the menus to start SAS® applications like SAS Management Console and SAS Display Manager. These days I mostly use GNOME Do as an application launcher (its a bit like Quicksilver for Mac OS X). Naturally I like to be able to launch SAS apps from GNOME Do too. One day soon I’ll give Ubuntu Unity another try, and when I do eventually make the switch I probably wont use GNOME Do any more (but I’ll remember it as good friend), so then I’ll want to launch SAS apps from Unity. Thankfully I can use the same custom .desktop files in all of these situations.

Originally I used to point and click my way through editing the menus to add a custom launchers for SAS apps, but I soon tired of that. I’m a programmer too. Pointing and clicking doesn’t stay fun for long ;)

I found out about .desktop files after delving into what happened behind the scenes when I was editing the menus using point and click methods. There’s a GNOME Dev Center tutorial on it “Desktop files: putting your application in the desktop menus“. The freedesktop.org site also has a list of the Recognized desktop entry keys that can go into a .desktop file.

Below are some of the .desktop files that I use to launch SAS applications on my Ubuntu installation. I have only tested these on Ubuntu using GNOME but, since they are standard freedesktop files, I imagine they should also work in KDE and other Linux distros.

If you place the .desktop files into the ~/.local/share/applications directory (e.g. /home/userid/.local/share/applications) they will only be available to that single user. Alternatively, if they are placed into the /usr/share/applications directory then they will be available to all users.

The nice thing I have found about editing .desktop files is, so far at least, the changes have shown up in the menus immediately (i.e. without having to logout/login). In my installation the menu entries for these .desktop files can be found in the Ubuntu Applications > Other menu. They show up in GNOME Do too.

Here are the .desktop files I use:

SAS Management Console 9.2

This is a sas-mc-92.desktop file that I use to launch SAS Management Console 9.2 on Ubuntu.

#!/usr/bin/env xdg-open

[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Name=SAS Management Console 9.2
Icon=/usr/local/SAS/SASManagementConsole/9.2/SMCApplication.ico
Exec=/usr/local/SAS/SASManagementConsole/9.2/sasmc

SAS 9.2 Display Manager

This is a sas-dms-9.2.desktop file that I (sometimes) use to launch the old SAS Display Manager System (DMS) interface on Ubuntu. If you’ve used DMS on Linux you’ll probably understand why I only use it sometimes ;). I’d love to have a native SAS Enterprise Guide like interface to SAS on Linux (without having to start Windows in VirtualBox). I know, given the numbers, it’s unlikely to ever happen, but I can’t help wishing nonetheless :).

#!/usr/bin/env xdg-open

[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Name=SAS Display Manager 9.2
Icon=/usr/local/SAS/sas-logo-48x48.png
Exec=/usr/local/SAS/SASFoundation/9.2/sas

You wont find the /usr/local/SAS/sas-logo-48×48.png file in your installation so you should substitute that with the path to an appropriate icon available in your installation.

SAS Management Console 9.1

This is a sas-mc-913.desktop file that I use to launch SAS Management Console 9.1 on Ubuntu.

#!/usr/bin/env xdg-open

[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Name=SAS Management Console 9.1
Icon=/usr/local/SAS/SASManagementConsole/9.1/SMCApplication.ico
Exec=/usr/local/SAS/SASManagementConsole/9.1/sasmc

If you are a SAS user on Linux then I hope you find these useful too. If you have any info/experiences to share regarding .desktop files launching SAS apps in other desktop environments (KDE, Xfce etc.) or other distros (RHEL, Fedora etc.) then please leave a comment.

Disabling the Ubuntu Login Screen (GDM) User Pick List

I’m used to typing in both my userid and my password when I log in to computers. I have never been a fan of the user pick lists that now seem to be common to many operating systems. I can see how they can be convenient for family machines at home, but the idea of advertising a list of potential accounts to compromise doesn’t sit well with me, so my preference is to disable the pick list and go back to the traditional typed userid & password form.

I run SAS on Ubuntu and recent Ubuntu versions (I forget which one it started with) now have a user pick list by default. The method for disabling the user pick list in Ubuntu is not that obvious and I find myself googling it every time I need it. A good article that provides both command line and GUI methods of disabling the user list can be found at Disabling the Login Screen User List in Ubuntu

The command line version is:

sudo -u gdm gconftool-2 --set --type boolean /apps/gdm/simple-greeter/disable_user_list true

This can be followed by a quick restart of GDM:

restart gdm

.. and the user pick list is no more.

With Lucid (Ubuntu 10.04 LTS) there is still a redundant login button that needs to be clicked before you get to type your user id, but it’s still better than before. There has been a bug lodged for this behaviour (GDM without user list requires that you click Log In) and it appears to have been fixed so I look forward to seeing it when I next upgrade.