Database Startup and Strange Semaphore Problem

One of our Netapp SnapManager database backup jobs was failing due to the instance being down on the first node of a 10G 4 node Solaris RAC. The backups are only configured to run off the first node so when I went to start up the instance I got the following error message:

After a quick look in metalink I realised this is most likely due to an issue with our semaphore kernel parameters. Checking out our values we can see.

Now lets look at how many we have allocated with ipcs.

We have 613 but our value for semmni is set to 600. Lets check the other 4 nodes.

We see that node 1 has by far and away the most. Lets look at this in more detail.

Lots of active semaphores for root. We can use ipcrm to start removing them or we can reboot the server to clear them up. As this server is controlled by a third party the second approach was taken and the seminfo_semmni parameter was increased.

Further investigation on the semaphore leak is required.

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA * Time limit is exhausted. Please reload CAPTCHA.