Translate

Saturday, 14 July 2012

Dependencies of 11g R2 clusterware and ASM


In Oracle 10g RAC and 11gR1 RAC,  Oracle clusterware and ASM are installed in the different Oracle homes, and the Clusterware has to be  up before ASM instance can be started because ASM instance uses the clusterware to access the shared storage.  Oracle 11g R2 introduced the  grid infrastructure home which combines Oracle clusterware and ASM.  The OCR and votingdisk of 11g R2 clusterware can be stored in ASM.  So it seems that ASM needs the clusterware up first to access the shared storage  and the clusterware needs ASM up first before it can access its key data structure: OCR and votingdisk.  So really clusterware and ASM, which one needs to be up first, and which one has to wait for other? This seemed to be the chicken or the ego problem.
 Oracle’s solution to this problem is to combines  the clusterware and ASM  into a single Grid Infrastructure home and  comes up a  procedure with  a complex  start up sequence which  mixes  the different components of clusterware and ASM  instance in order.  Oracle Metalink note 11gR2 Clusterware and Grid Home – What You Need to Know [ID 1053147.1] gave the following  startup sequence:
Although the clusterware startup command  $GI_HOME/bin/crsctl start crs follows this sequence to bring both clusterware and ASM online, but this command really doesn’t echo back each milestone of the startup process and we really can’t see how the startup was done.  A workaround is to look at the some of outputs  of root.sh command during the initial Grid infrastructure installation process as follow:
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘owirac1′
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘owirac1′
CRS-2676: Start of ‘ora.mdnsd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘owirac1′
CRS-2676: Start of ‘ora.gipcd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘owirac1′
CRS-2676: Start of ‘ora.gpnpd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘owirac1′
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘owirac1′
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘owirac1′
CRS-2676: Start of ‘ora.diskmon’ on ‘owirac1′ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘owirac1′
CRS-2676: Start of ‘ora.ctssd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.asm’ on ‘owirac1′
CRS-2676: Start of ‘ora.asm’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘owirac1′
CRS-2676: Start of ‘ora.crsd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘owirac1′
CRS-2676: Start of ‘ora.evmd’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.asm’ on ‘owirac1′
CRS-2676: Start of ‘ora.asm’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.OCRVOTDSK.dg’ on ‘owirac1′
CRS-2676: Start of ‘ora.OCRVOTDSK.dg’ on ‘owirac1′ succeeded
CRS-2672: Attempting to start ‘ora.registry.acfs’ on ‘owirac1′
CRS-2676: Start of ‘ora.registry.acfs’ on ‘owirac1′ succeeded
This  sequence shows  the   ASM instance startup  is just one step in middle of  the entire sequence : Some of  CRS components such as CSSD, CTSS get started before ASM, while other components such as CRSD,  EVEMD, ACFS are up after the ASM starts.  This sequence can be also confirmed by the  timestamps and log messages in  clusterware log files  (alter<hostname>.log, cssd.log and crsd.log)  and ASM instance log like  alert_+ASM1.log . Here are the sequences of messages and their timestamps: during the startup of 11g R2 clusterware and ASM instance:
OLR service started  : 2012-07-15 16:33:13.678
Starting CSS daemon 2012-07-15 16:33:18.684:
Fetching asmlib disk :ORCL:OCR1 : 2012-07-15 16:33:24.825
Read ASM header off dev:ORCL:OCR3:224:256
Opened hdl:0x1d485110 for dev:ORCL:OCR1: 2012-07-15 16:33:24.829
Successful discovery for disk ORCL:OCR1 : 2012-07-15 16:33:24.837
Successful discovery of 5 disks: 2012-07-15 16:33:24.838
CSSD voting file is online: ORCL:OCR1:  2012-07-15 16:33:50.047
CSSD Reconfiguration complete: 2012-07-15 16:34:07.729
The Cluster Time Synchronization Service started:  2012-07-15 16:34:12.333
Note: ** CSSD and CTSSD got up before ASM .  Votingdisks were discovered  by reading the header of the ASM disks (OCRL:OCR1) of  the votingdisk diskgroup without using ASM instance **
Starting ASM: Jan 17 16:34:13 2011 
CRS Daemon Starting 2012-07-15 16:34:30.329:  
Checking the OCR device : 2012-07-15 16:34:30.331
Initializing OCR 2012-07-15 16:34:30.337
diskgroup OCRVOTDSK was mounted : Jan 17 16:34:30 2011
OCRVOTDSK was mounted : Mon Jan 17 16:34:30 2011
The OCR service started : 2012-07-15 16:34:30.835
Verified ocr1-5: 2012-07-15 16:33:50.128
Cluster Time Synchronization Service started:  2012-07-15 16:34:12.333
The OCR service started : 2012-07-15 16:34:30.835 
CRSD started: 2012-07-15 16:34:31.902
Note: CRS server started  after ASM is up and the diskgroup for OCR and votingdisks are mounted
From this sequence of the log message and timestamp, we get some understanding about the sequence of clusterware and ASM instance:
1)      CSSD and CTSSD are up before ASM
2)      Votingdisks used by CSSD are discovered by reading the header of the disks, not throught ASM
3)      Startup of CRS service has to wait until ASM instance is up and the diskgroup for OCR and votingdisk is mounted.

No comments:

Post a Comment