Database Reference
In-Depth Information
for a 500-node standard cluster is 124,750, as we discussed in the beginning this chapter. The number of storage
network connections is also reduced from 500 to 25.
This significant reduction of network connections allows us to further scale out the cluster. With this hub-and-
spoke topology, the Flex Cluster in Oracle 12cR1 is designed to scale up 64 Hub nodes and many more Leaf nodes.
The Flex Cluster architecture helps to maintain the availability and reliability of the cluster even when the cluster
is scaled out to a very large number of nodes. This is achieved by having the OCR and voting disk accessible only to
Hub nodes and not Leaf nodes. For example, we will get the following error messages if we query the voting disks or
OCR access from a Leaf node:
$ crsctl query css votedisk
CRS-1668: operation is not allowed on a Leaf node
$ ocrcheck
PROT-605: The 'ocrcheck' command is not supported from a Leaf node.
As shown in the previous 500-node example, in Flex Clusters there are only a small number of Hub nodes and
the majority of the cluster nodes are Leaf nodes. By allowing only the Hub nodes to access the OCR and voting disks,
scaling out a Flex Cluster will not significantly increase resource contention for OCR and voting disks. As a result, the
chance of node eviction caused by contention for OCR and voting disk will not increase as the cluster is scaled out.
Like a standard cluster, an Oracle Flex Cluster is built with high-availability design. If a Hub node fails, this node
will be evicted from the cluster in the same way as a node in a standard cluster. The services on the failed node will
be failed over to other surviving Hub node in the cluster. The Leaf nodes that were connected to the failed Hub node
can be reconnected to another surviving Hub node within a grace period. The private interconnect heartbeat between
two Hub nodes are the same as the private interconnect heartbeat in the standard cluster. You can check the heartbeat
misscount setting between Hub nodes using the following crsctl command:
$crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
If a Leaf node fails, this node will be evicted from the cluster. The services running on the failed Leaf node are
failed over to other Leaf nodes that are connected the same Hub node. This failover mechanism keeps the failover
within the group of Leaf nodes that are connected to the same Hub node. In this way, the other part of the cluster
nodes will be not impacted by this Leaf node's failure.
The network heartbeat is used to maintain network connectivity between a Leaf node and the Hub node to which
the Leaf node connects. Similar to the private interconnect heartbeat between the Hub nodes, the maximal threshold
time that this heartbeat is tolerable is defined by the leafmisscount setting, which by default is 30 seconds. If the
heartbeat failure passes this leafmisscount setting, then the Leaf node either will be reconnected to the other Hub
node or will be evicted from the cluster. You can query this setting by running this command:
$ crsctl get css leafmisscount
CRS-4678: Successful get leafmisscount 30 for Cluster Synchronization Services
You can also manually reset this setting by running this command on a Hub node.
$ crsctl set css leafmisscount 40
CRS-4684: Successful set of parameter leafmisscount to 40 for Cluster Synchronization Services.
You cannot reset this setting from a Leaf node:
$ crsctl set css leafmisscount 40
CRS-1668: operation is not allowed on a Leaf node
Search WWH ::




Custom Search