Content Comparison

...

Type of Failure	Recovery
Disk Failure	Storage volumes are on RAID 6 redundant aggregates. They can tolerate up to 2 disk failures.
Netapp Head Failure	Secondary head takes over as primary and the transition happens in seconds transparent to the application
VM Host failure	VM ware moves the VM to a different host. This is cold failover but happens in seconds.
VM Host saturation	DRS moves the VM to a different VM host, transparent to the applications
VM network card failure	VM network connectivity is redundant, it can deal with single failures
Storage network card failure	Storage connection to the storage subnet is completely redundant. It can deal with single failure transparently.
Storage subnet	Storage subnet is completely redundant switched network.
Database instance failure	Recovery is manual, operators notice on the monitoring tool and then call the on call DBA
Database failure	Recovery is manual, operators notice on the monitoring tool and then call the on call DBA

Options to make Database HA automatic:

a) RAC

RAC means we run a >3 node cluster as active-active oracle instances mounting the same database. RAC runs a heartbeat across the cluster and links a virutal IP with each node in the cluster. If any host or database fails in the cluster the VIP of the corresponding node fails over to a surviving node in the cluster. As soon as all the clients connected to the failed node retry the connection they get a immediate rejection since the VIP has failed over and they can get a new connection from a surviving node immediately. This failover happens within seconds. RAC can be implemented in current infrastructure on VM's over NFS with Netapp.

b) Dataguard with FSFO (Fast Start Fail Over)

Dataguard is a active-passive setup where a primary node ships the transaction logs to the standby which is in constant recovery. The secondary node can be opened as read only with Active Data Guard license. A third node that is called observer, keeps a heart beat with both primary and secondary. If the observer looses connection with primary but can still talk to secondary, it makes the secondary primary but the IP's are not switched so there will be a TCP time out involved when the failed client retries a connection. After the failover, if the primary can still be contacted it turns into a standby database to the new primary DB.

Version	Old Version 7	New Version Current
Changes made by	Amit Poddar (Unlicensed)	Amit Poddar (Unlicensed)
Saved on	Dec 10, 2013	Dec 10, 2013

Versions Compared

Key