After updating to Virtual Box 5 I noticed my 3 node demo RAC cluster had problems starting cluster resources. Any time I do any sort of upgrade with Virtual Box I cycle through my vm’s to make sure they are all working so I don’t have any nasty surprises whenever I need to use them.
After the upgrade, I started up the first node of the cluster and waited, came back to it a few minutes later and noticed no resources running and no crs. I tried to start it manually with crsctl start crs but it just hung for a while and nothing was happening. Next thing to do was to strace it and see if anything useful came up.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
$ strace -fae -o crs.log crsctl start crs .. snip .. 4805 stat("/u01/app/12.1.0.2/grid/perl/lib/5.14.1/x86_64-linux-thread-multi", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 4805 stat("/u01/app/12.1.0.2/grid/perl/lib/5.14.1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 4805 stat("/u01/app/12.1.0.2/grid/perl/lib/x86_64-linux-thread-multi", 0x7fff823226d0) = -1 ENOENT (No such file or directory) 4805 readlink("/proc/self/exe", "/u01/app/12.1.0.2/grid/perl/bin/perl", 4095) = 36 4805 --- SIGSEGV (Segmentation fault) @ 0 (0) --- 4803 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15, {1436501940, 712356000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17, {1436501940, 724687000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 19, {1436501940, 734827000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 21, {1436501940, 746466000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 23, {1436501940, 756835000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 25, {1436501940, 766930000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) 4803 futex(0x112ecb0, FUTEX_WAKE_PRIVATE, 1) = 0 4803 futex(0x112ed84, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 27, {1436501940, 778205000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) |
The segmentation fault on the /u01/app/12.1.0.2/grid/perl/bin/perl binary gave me a clue. The Clusterware scripts are heavy users of perl so there may be something wrong with the perl version that exists in the grid home. Any use of perl from the clusterware home caused a segmentation fault.
The default OS version of perl for OEL 6.6 was working ok but that is version is 5.10 while the clusterware perl installation is 5.14
1 2 3 4 5 6 7 8 9 10 11 12 |
[root@12crac1 ~]# /usr/bin/perl -v This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi Copyright 1987-2009, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. |
Nevertheless I thought I’d try and start the cluster with it so I did the following:
1 2 3 4 5 |
[root@12crac1 ~]# cd $ORACLE_HOME/perl/bin [root@12crac1 bin]# mv perl perl.orig [root@12crac1 bin]# cp /usr/bin/perl . [root@12crac1 bin]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. |
This worked enough for the cluster to start and all resources were up. Unfortunately this is not a really good bandaid fix as the clusterware home also uses a number of perl modules which are tied to perl 5.14. So with my temp workaround if you try running something like asmcmd you’ll see the following error:
1 2 3 4 5 |
[oracle@12crac1 bin]$ asmcmd Can't locate Term/ReadKey.pm in @INC (@INC contains: /u01/app/12.1.0.2/grid/perl/lib/5.10.1 /u01/app/12.1.0.2/grid/perl/lib/site_perl/5.10.1 /u01/app/12.1.0.2/grid/lib /u01/app/12.1.0.2/grid/lib/asmcmd /u01/app/12.1.0.2/grid/rdbms/lib/asmcmd /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /u01/app/12.1.0.2/grid/lib/asmcmdshare.pm line 323. BEGIN failed--compilation aborted at /u01/app/12.1.0.2/grid/lib/asmcmdshare.pm line 323. Compilation failed in require at /u01/app/12.1.0.2/grid/bin/asmcmdcore line 183. BEGIN failed--compilation aborted at /u01/app/12.1.0.2/grid/bin/asmcmdcore line 183. |
The best solution was to download perl 5.14 and then link it and copy into the clusteware home. Luckily Laurent Leturgez had hit a similar problem in Oracle 12c, VMWare Fusion and the perl binary’s segmentation fault so I had a pretty good guide as to how to get it done.
1 2 3 4 5 6 7 8 |
[oracle@12crac1 ~]$ curl -O http://www.cpan.org/src/5.0/perl-5.14.1.tar.gz [oracle@12crac1 ~]$ tar zxf perl-5.14.1.tar.gz [oracle@12crac1 ~]$ cd perl-5.14.1 [oracle@12crac1 ~]$ ./Configure -des -Dprefix=$ORACLE_HOME/perl -Doptimize=-O3 -Dusethreads -Duseithreads -Duserelocatableinc && make clean && make && make install [oracle@12crac1 ~]$ for x in 2 3 > do > scp -r perl/ oracle@12crac${x}:/u01/app/12.1.0.2/grid |
Once I got the perl compiled and copied to every node I cycled the servers with a reboot and all was well again.
Update
Thanks to Simon Coter, this is a bug in the perl binary as can be seen in the ticket here Oracle 12.1.0.2 Grid Installation fails to relink VB5 on Oracle Linux 6.7 guest,OSX host.