I had been puzzling over why some SAS® Viya™ services were not starting on a machine reboot. Initially I thought the answer appeared in the SAS Viya 3.2 Administration documentation set: see the General Servers and Services: Troubleshooting section.
I found that all the expected services started after:
[root@hostname ~]# /etc/init.d/sas-viya-all-services stop
[root@hostname ~]# rm -f /opt/sas/viya/config/data/consul/checks/*
[root@hostname ~]# /etc/init.d/sas-viya-all-services start
[root@hostname ~]# /etc/init.d/sas-viya-all-services status
However, on further investigation it turned out that it probably wasn’t a problem with those consul/checks files. After another reboot I found that, once again, only a subset of the services had started. Using systemctl to check the status I found the following:
[root@hostname ~]# systemctl status sas-viya-all-services
sas-viya-all-services.service - start and stop all SAS services
Loaded: loaded (/usr/lib/systemd/system/sas-viya-all-services.service; enabled; vendor preset: disabled)
Active: failed (Result: timeout) since Mon 2017-04-24 13:33:54 AEST; 31min ago
Process: 717 ExecStart=/etc/init.d/sas-viya-all-services start (code=killed, signal=TERM)
Main PID: 717 (code=killed, signal=TERM)
CGroup: /system.slice/sas-viya-all-services.service
Apr 24 13:33:12 hostname su[12824]: (to sasrabbitmq) root on none
Apr 24 13:33:30 hostname sas-viya-all-services[717]: There are still 1 pending processes
Apr 24 13:33:30 hostname sas-viya-all-services[717]: Starting sas-viya-datatables-default
Apr 24 13:33:30 hostname sas-viya-all-services[717]: Starting sas-viya-deploymentBackup-default
Apr 24 13:33:30 hostname sas-viya-all-services[717]: Starting sas-viya-device-management-default
Apr 24 13:33:30 hostname sas-viya-all-services[717]: Pausing to allow services time to start...
Apr 24 13:33:54 hostname systemd[1]: sas-viya-all-services.service start operation timed out. Terminating.
Apr 24 13:33:54 hostname systemd[1]: Failed to start start and stop all SAS services.
Apr 24 13:33:54 hostname systemd[1]: Unit sas-viya-all-services.service entered failed state.
Apr 24 13:33:54 hostname systemd[1]: sas-viya-all-services.service failed.
So it was due to the amount of time it was taking sas-viya-all-services to start all the services. This is a simple dev/test deployment with everything on one machine, unlike a real deployment where they are much more likely to be distributed over multiple machines. I needed to bump up the timeout for sas-viya-all-services to allow it to complete.
I could see the current timeout settings with:
[root@hostname ~]# systemctl show sas-viya-all-services.service | grep ^Timeout
TimeoutStartUSec=15min
TimeoutStopUSec=15min
… so I bumped the timeout up to 60 minutes to give it more than enough time:
[root@hostname ~]# sed -i s/TimeoutSec=15min/TimeoutSec=60min/ /usr/lib/systemd/system/sas-viya-all-services.service
[root@hostname ~]# systemctl daemon-reload
[root@hostname ~]# systemctl show sas-viya-all-services.service | grep ^Timeout
TimeoutStartUSec=1h
TimeoutStopUSec=1h
… did another reboot and watched the progress with:
[root@hostname ~]# tail -f `ls -1 /opt/sas/viya/config/var/log/all-services/default/all-services*.log | tail -n 1`
… then when complete verified with:
[root@hostname ~]# systemctl status sas-viya-all-services
sas-viya-all-services.service - start and stop all SAS services
Loaded: loaded (/usr/lib/systemd/system/sas-viya-all-services.service; enabled; vendor preset: disabled)
Active: active (exited) since Mon 2017-04-24 15:09:15 AEST; 2min 24s ago
Process: 722 ExecStart=/etc/init.d/sas-viya-all-services start (code=exited, status=0/SUCCESS)
Main PID: 722 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/sas-viya-all-services.service
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-backup-agent-default 00:01:41
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-environmentmanager-default 00:02:41
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-monitoring-default 00:02:41
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-sashome-default 00:02:41
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-sasreportviewer-default 00:02:47
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-sasthemedesigner-default 00:02:28
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-sasvisualanalytics-default 00:02:27
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-viya-sasvisualdatabuilder-default 00:02:27
Apr 24 15:09:15 hostname sas-viya-all-services[722]: sas-services completed in 00:31:05
Apr 24 15:09:15 hostname systemd[1]: Started start and stop all SAS services.
… or:
[root@hostname ~]# /etc/init.d/sas-viya-all-services status
Getting service info from consul...
Service Status Host Port PID
sas-viya-consul-default up N/A N/A 3727
sas-viya-sasdatasvrc-postgres-node0-ct-pg_hba up N/A N/A 3969
sas-viya-sasdatasvrc-postgres-node0-ct-postgresql up N/A N/A 3991
sas-viya-sasdatasvrc-postgres-pgpool0-ct-pcp up N/A N/A 4036
sas-viya-sasdatasvrc-postgres-pgpool0-ct-pgpool up N/A N/A 4054
sas-viya-sasdatasvrc-postgres-pgpool0-ct-pool_hba up N/A N/A 4278
sas-viya-sasdatasvrc-postgres up N/A N/A 4740
sas-viya-cascontroller-default up N/A N/A 930
sas-viya-httpproxy-default up N/A N/A 6290
sas-viya-rabbitmq-server-default up N/A N/A 6039
sas-viya-sasdatasvrc-postgres-node0 up N/A N/A 4689
sas-viya-sasstudio-default up N/A N/A 1115
sas-viya-spawner-default up N/A N/A 858
...
sas-viya-sasvisualanalytics-default up 10.10.10.10 43115 11303
sas-viya-sasvisualdatabuilder-default up 10.10.10.10 37424 11334
sas-services completed in 00:00:18
Given the extra time, now all of the SAS Viya services get a chance to start after a machine reboot.
I have recent experience of installing SAS Viya 3.2 and experiencing moreover same issue while start the servers, the pgpool services not started properly and it affected the other services too. But the issue persists only if the server stop abnormally or if you stop the server without properly stopping the Viya services. One of the workaround is stop all the viya services before stopping/reboot the server and hopefully you will not face the issue while starting/reboot the services/server again.
Hi Sanket,
Thanks for your sharing your experiences. That’s not something that I’ve encountered yet but I’ll keep an eye out for it. I’ll probably switch from the default auto-start-on-boot/auto-stop-on-shutdown to a manual-start-on-boot/manual-stop-on-shutdown for this dev/test environment. I like an opportunity to investigate and tweak before starting/stopping the services.
Cheers
Paul
Hi Paul,
Great post, another accidental google led me here. This solves a bit of a problem for us; it not being a solution I was actively searching for makes it all the better.
Sanket, I’ve seen the issue with the pgpool being caused by the improper shutdown of postgres a number of times too. I’m curious as to whether increasing the timeout when the same script is used to stop pg might actually fix the issue. It certainly feels like the script isn’t blocking the shutdown for long enough to allow postgres to stop cleanly.
Nik
I am facing the issue while starting the services in micro-services
Unable to status the consul leader.
waiting for consul service to start
Appreciate your help..!!
I haven’t seen that specific error so have no suggestions other than to check the logs for further clues, post to https://communities.sas.com/ to appeal to a wider audience, and engage the expertise of SAS Technical Support.