Hardware Reference
In-Depth Information
to flexibly control the type of striping as well as the redundancy level of
the storage system at the system, directory, and even file levels. Traditional
storage systems would require that an entire RAID volume be dedicated to
a particular performance type and protection setting. For example, a set of
disks might be arranged in a RAID 1+0 protection for a database. This makes
it dicult to optimize spindle use over the entire storage estate (since idle
spindles cannot be borrowed) and also leads to inflexible designs that do not
adapt with the business requirement. OneFS allows for individual tuning and
flexible changes at any time, fully online. In 2013, the most current version of
OneFS was 6.5.4.
11.5 Data Protection
11.5.1 Power Loss
A file system journal, which stores information about changes to the file
system, is designed to enable fast, consistent recoveries after system failures
or crashes, such as power loss. The file system replays the journal entries
after a node or cluster recovers from a power loss or other outage. Without
a journal, a file system would need to examine and review every potential
change individually after a failure (an fsck or chkdsk operation); in a large
file system, this operation can take a long time.
OneFS is a journaled file system in which each node contains a battery-
backed NVRAM card used for journaling. The NVRAM card battery charge
lasts many days without requiring a recharge. When a node boots up, it checks
its journal and selectively replays transactions to disk where the journaling
system deems it necessary. OneFS will mount only if it can guarantee that all
transactions not already in the system have been recorded. For example, if
proper shutdown procedures were not followed, and the NVRAM battery dis-
charged, transactions might have been lost; to prevent any potential problems,
the node will not mount the file system.
11.5.2 Scalable Rebuild
OneFS does not rely on hardware RAID either for data allocation, or for
reconstruction of data after failures. Instead OneFS manages protection of
file data directly, and when a failure occurs, it rebuilds data in a parallelized
fashion. OneFS is able to determine which files are affected by a failure in
constant time, by reading inode data linearly, directly of disk. The set of
affected files are assigned to a set of worker threads that are distributed among
the cluster nodes by the job engine. The worker nodes repair the files in
parallel. This implies that as cluster size increases, the time to rebuild from
 
Search WWH ::




Custom Search