Hardware Reference
In-Depth Information
failures decreases. This has an enormous eciency advantage in maintaining
the resiliency of clusters as their size increases.
11.5.3 Virtual Hot Spare
Most traditional storage systems based on RAID require the provisioning
of one or more \hot spare" drives to allow independent recovery of failed
drives. The hot spare drive replaces the failed drive in a RAID set. If these
hot spares are not themselves replaced before more failures appear, the system
risks a catastrophic data loss. OneFS avoids the use of hot spare drives, and
simply borrows from the available free space in the system in order to recover
from failures; this technique is called virtual hot spare. In doing so, OneFS
allows the cluster to be fully self-healing, without human intervention. The
administrator can create a virtual hot spare reserve, allowing for a guarantee
that the system can self-heal despite ongoing writes by users.
11.5.4 N + M Data Protection
The Isilon cluster is designed to tolerate one or more simultaneous com-
ponent failures without preventing the cluster from serving data. The Isilon
system can use either a Reed{Solomon error correction (N + M protection)
system, or a mirroring system for files. Data protection is applied at the file
level, and not the system level, enabling the system to focus on recovering
only those files that are compromised by a failure rather than having to check
and repair the entire file set. Metadata and inodes are protected at least at
the same level of protection as the data they reference. Metadata and inodes
are always protected by mirroring, rather than Reed{Solomon coding.
Because all data, metadata, and parity information are distributed across
the nodes of the cluster, the Isilon cluster does not require a dedicated parity
node or drive, or a dedicated device or set of devices to manage metadata.
This ensures that no one node can become a single point of failure. All nodes
share equally in the tasks to be performed, providing perfect symmetry and
load balancing in a peer-to-peer architecture.
The Isilon system provides several levels of configurable data protection
settings, which can be modified at any time without needing to take the cluster
or file system oine.
For a file protected with erasure codes, each of its protection groups is
protected at a level of N + Mb, where N > M and M b. The values N and
M represent, respectively, the number of drives used for data and for erasure
codes within the protection group. The value of b relates to the number of data
stripes used to lay out that protection group, and is covered below. A common
and easily understood case is where b = 1, implying that a protection group
incorporates N drives worth of data; M drives worth of redundancy, stored in
erasure codes; and that the protection group should be laid out over exactly
one stripe across a set of nodes. This implies that M members of the protection
 
Search WWH ::




Custom Search