Oracle 11.2.0.2 Grid Infrastructure Installation Issues on Solaris

This post discusses some issues encountered installing Oracle 11.2.0.2 on a two-node Solaris 5.10 SPARC cluster

The Grid Infrastructure installation failed when running root.sh on the second node with the following error:

root@server21 # /data/app/oragrid/11.2.0/grid/root.sh
Running Oracle 11g root script...
The following environment variables are set as:
  ORACLE_OWNER= grid
  ORACLE_HOME=  /data/app/oragrid/11.2.0/grid

Enter the full pathname of the local bin directory: 
  [/usr/local/bin]:
The contents of "dbhome" have not changed. 
  No need to overwrite.
The contents of "oraenv" have not changed. 
  No need to overwrite.
The contents of "coraenv" have not changed. 
  No need to overwrite.

Entries will be added to the /var/opt/oracle/oratab file as 
needed by Database Configuration Assistant when a database 
is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: 
  /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but 
found an active CSS daemon on node server21, number 1, and 
is terminating
An active cluster was found during exclusive startup, 
restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered 
mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm 
line 1017.
/u01/app/11.2.0/grid/perl/bin/perl 
-I/u01/app/11.2.0/grid/perl/lib 
-I/u01/app/11.2.0/grid/crs/install 
/u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

We have seen this error before on Linux so we downloaded the mcasttest utility from MOS Note 1212703.1 "Grid Infrastructure install or upgrade may fail due to Multicasting"

A detailed explanation of the mcasttest utility is available elsewhere. The servers were called server21, server22 (renamed to protect the customer's identity). The private network used device e1000g2 so we executed mcasttest as follows:

 ./mcasttest.pl -n server21,server22 -i e1000g2
###########  Setup for node server21  ##########
Checking node access 'server21'
Checking node login 'server21'
Checking/Creating Directory /tmp/mcasttest for binary on node 'server21'
Distributing mcast2 binary to node 'server21'
###########  Setup for node server22  ##########
Checking node access 'server22'
Checking node login 'server22'
Checking/Creating Directory /tmp/mcasttest for binary on node 'server22'
Distributing mcast2 binary to node 'server22'
###########  testing Multicast on all nodes  ##########

Test for Multicast address 230.0.1.0

Multicast Failed for e1000g2 
  using address 230.0.1.0:42000

Test for Multicast address 224.0.0.251

Multicast Failed for e1000g2 
  using address 224.0.0.251:42001

In this example multicasting is failing for both the default address (230.0.1.0) and the alternative address (224.0.0.251). Therefore there is no point in installing the patch for bug 9974223 - "Grid Infrastructure needs multicast communication on 230.0.1.0 address working" as it will not solve the problem.

The customer was using CISCO switches on which multicasting had been deliberately disabled for business reasons. Therefore we were unable to explore the possibility of enabling multicasting at switch level.

An alternative approach was required, so we tried replacing the CISCO switches on the private network with a cheap 100Mb switch. If this test was successful the customer was planning to order two dedicated switches to provide resilience and to use IPMP to configure the private network.

After the new switch was installed the output of mcasttest was:

$ ./mcasttest.pl -n server21,server22 -i e1000g2
###########  Setup for node server21  ##########
Checking node access 'server21'
Checking node login 'server21'
Checking/Creating Directory /tmp/mcasttest for binary on node 'server21'
Distributing mcast2 binary to node 'server21'
###########  Setup for node server22  ##########
Checking node access 'server22'
Checking node login 'server22'
Checking/Creating Directory /tmp/mcasttest for binary on node 'server22'
Distributing mcast2 binary to node 'server22'
###########  testing Multicast on all nodes  ##########

Test for Multicast address 230.0.1.0

Multicast Succeeded for e1000g2 
  using address 230.0.1.0:42000

Test for Multicast address 224.0.0.251

Multicast Succeeded for e1000g2 
  using address 224.0.0.251:42001

So replacing the switch had solved the multicast issue.

We attempted to reinstall Grid Infrastructure, but again it failed when we ran root.sh on the second node with the following error:

Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered
mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm 
line 1017.
/u01/app/11.2.0/grid/perl/bin/perl 
-I/u01/app/11.2.0/grid/perl/lib 
-I/u01/app/11.2.0/grid/crs/install 
/u01/app/11.2.0/grid/crs/install/rootcrs.pl 
  execution failed

We installed the patch for bug 9974223 - "Grid Infrastructure needs multicast communication on 230.0.1.0 address working".

The README for the Solaris SPARC version of patch 9974223 was incorrect; it appeared to be a generic patch notice rather than a patch for Grid Infrastructure. We compared with the Linux x86-64 README and discovered this was completely different and appeared to be correct. So we followed the Linux x86-64 README using the Solaris patch.

However, following installation of this patch the error still occurred. We tried a couple more times, then resorted to digging around in the log files to try to identify exactly where root.sh was failing.

The root.sh script calls several other scripts and finally executes

$GRID_HOME/crs/install/rootcrs.pl
. The rootcrs.pl script creates a log file in;
$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<server>.log

This script contains some useful output. The rootcrs.pl script attempts to start CSSD using a root agent. The log file for the root agent is:

  $GRID_HOME/log/<server>/agent/ohasd/orarootagent_root/orarootagent_root.log

The root agent starts the CSSD daemon which logs its activities in:

$GRID_HOME/log/<server>/cssd/ocssd.log

Investigating these log files, we became convinced that the error for which we were looking was not caused by the multicasting configuration which appeared to be working correctly. However, we did notice a strange discrepancy in ifconfig for the interconnect on the first node after root.sh had been executed.

e1000g2: 
   flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,
     IPv4,FIXEDMTU> mtu 9000 index 3
   inet 172.16.0.2 netmask ffffff00 
   broadcast 172.16.0.255
   ether 0:c0:dd:14:c:58
e1000g2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,
      IPv4> mtu 9216 index 3
   inet 169.254.35.5 netmask ffff0000 
   broadcast 169.254.255.255

In Oracle 11.2.0.2 a new feature called the redundant interconnect has been introduced. Instead of configuring bonding at operating system level, it is theoretically possible to allow Oracle to manage multiple physical networks providing load balancing and failover capabilities. Within the Grid Infrastructure the new feature is known as HAIP. I will cover this in more detail in another post. I looked at the ifconfig from another customer where we had successfully installed Oracle 11.2.0.2. Even if the bonding has been configured for the private network, the Grid Infrastructure will automatically create a new VIP on the private network device for each node in the cluster. The IP address is automatically allocated from the 169.254.0.0 subnet, apparently using a random algorithm.

So in the above example the private network has been configured to use e1000g2. When Clusterware is started a new virtual IP is created called e1000g2:1. The latter address appears to be used by CSSD for inter-node communications.

However, take another look at the MTU sizes. For the e1000g2 the MTU size is 9000; for e1000g2:1 Oracle has used an MTU size of 9216. This anomaly had us scratching our heads. We took a look at the network device definition file

grid@server21$ cat /etc/hostname.e1000g2
server21-priv mtu 9000

So we were hard coding the MTU size in /etc/hostname.e1000g2, but Oracle was ignoring this and using 9216. Which size was correct?

Unlike Linux, in Solaris the ping command does not have an option to specify the packet size, so I tried the traceroute command.

We used the following syntax for the traceroute command:

traceroute -s <source> -r -F <target> <MTU Size>

By trial and error we discovered that the traceroute command was successful at all MTU sizes up to 1518, but failed for MTU sizes of 1519 and above. We immediately suspected our cheap switch and further experimentation quickly confirmed that it did not support MTU sizes greater than 1518.

On the other interfaces, the customer had used the default MTU size of 1500, so we decided to revert to this. As it is the default we simply removed the MTU size from /etc/hostname.e1000g2. For example:

grid@server21$ cat /etc/hostname.e1000g2
server21-priv

However, this did not have the desired effect. In fact the MTU size for e1000g2 increased from 9000 to 9216. On Solaris the MTU size is defined in two places; in the interface definition file e.g. /etc/hostname.e1000g2 and also in the kernel driver e.g. /kernel/drv/e1000g.conf.

Initially /kernel/drv/e1000g.conf contained the following:

root@server21# cat /kernel/drv/e1000g.conf 
# Driver.conf file for Intel e1000g Gigabit Ethernet Adapter
MaxFrameSize=0,0,3,0,3,0,0,0,0,0,0,0,0,0,0,0;
        # 0 is for normal ethernet frames.
        # 1 is for upto 4k size frames.
        # 2 is for upto 8k size frames.
        # 3 is for upto 16k size frames.
        # These are maximum frame limits, not the actual 
        # ethernet frame size. Your actual ethernet frame 
        # size would be determined by protocol stack 
        # configuration (please refer to ndd command man 
        # pages)
        # For Jumbo Frame Support (9k ethernet packet)
        # use 3 (upto 16k size frames)
In the above file, the MaxFrameSize entry is defining the maximum frame size for each interface. The first entry is for e1000g0, the next for e1000g1 etc. In the above example the maximum frame size is set to the default (1500) for e1000g0, e1000g1 and e1000g3. It is set to 3 (up to 16k size frames) for e1000g2 and e1000g4. However as 9216 is the absolute limit for frame sizes these devices are restricted to this MTU size.

Therefore we updated MaxFrameSize so that all entries were set to zero and rebooted the node:

MaxFrameSize=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;

Following this change ifconfig reported the following:

e1000g2: 
  flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,
     IPv4> mtu 1500 index 3
   inet 172.16.0.1 netmask ffffff00 
   broadcast 172.16.0.255
   ether 0:c0:dd:14:7:80

So the MTU size was now set correctly to 1500

After reinstalling Grid Infrastructure on the first node (including root.sh) ifconfig reported the following:

e1000g2: 
  flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,
    IPv4> mtu 1500 index 3
  inet 172.16.0.1 netmask ffffff00 
  broadcast 172.16.0.255
  ether 0:c0:dd:14:7:80
e1000g2:1: 
  flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,
    IPv4> mtu 1500 index 3
  inet 169.254.248.231 netmask ffff0000 
  broadcast 169.254.255.255

So the new HAIP VIP was also now using the MTU size of 1500.

We tried running root.sh on the second node and this time it succeeded.