Friday, February 26, 2016

Impact on Oracle if ESX Host went down.

Here are collection of scenarios if ESX hosts gone down.

Symptom - vmwarning and storagerm logs will spit out tons of the following messages.


Some host is down, need to reset the slot allocation
Number of hosts has changed to 3
Number of hosts has changed to 8
Number of hosts has changed to 7
Some host is down, need to reset the slot allocation

Occasionally, the following would show up. This is when the the storage is impacted.

: NFSLock: 2208: File is being locked by a consumer on host 

As any Oracle DBA already can guess, the Oracle instance will crash and all depends on underlying disk availability. If the disks are down, ASM diskgroup will go down with it.

Alert log might show some or more of the following.


ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 20 (block # 392072)
ORA-01110: data file 20: '+ASMDATA/st
ORA-15080: synchronous I/O operation failed to read block 0 of disk 7 in disk group DATA
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 24576
Additional information: 4294967295
NOTE: cache initiating offline of disk 7 group DATA

What can you do about it? If it's single instance, not much. Upon bringing up the ESX Host, it should restore the services (most of the time), otherwise, DBA might need to restore and recover from RMAN. If it's a RAC setup (RAC on multi-hosts), perhaps check the SCSI Controller's SCSI Bus Sharing options is set to NONE, things should failover to the node that residing in the good host. 

No comments:

Post a Comment