OneFS - High Performance Parallel I/O

Hardware Reference

In-Depth Information

to flexibly control the type of striping as well as the redundancy level of

the storage system at the system, directory, and even file levels. Traditional

storage systems would require that an entire RAID volume be dedicated to

a particular performance type and protection setting. For example, a set of

disks might be arranged in a RAID 1+0 protection for a database. This makes

it dicult to optimize spindle use over the entire storage estate (since idle

spindles cannot be borrowed) and also leads to inflexible designs that do not

adapt with the business requirement. OneFS allows for individual tuning and

flexible changes at any time, fully online. In 2013, the most current version of

OneFS was 6.5.4.

11.5 Data Protection

11.5.1 Power Loss

A file system journal, which stores information about changes to the file

system, is designed to enable fast, consistent recoveries after system failures

or crashes, such as power loss. The file system replays the journal entries

after a node or cluster recovers from a power loss or other outage. Without

a journal, a file system would need to examine and review every potential

change individually after a failure (an fsck or chkdsk operation); in a large

file system, this operation can take a long time.

OneFS is a journaled file system in which each node contains a battery-

backed NVRAM card used for journaling. The NVRAM card battery charge

lasts many days without requiring a recharge. When a node boots up, it checks

its journal and selectively replays transactions to disk where the journaling

system deems it necessary. OneFS will mount only if it can guarantee that all

transactions not already in the system have been recorded. For example, if

proper shutdown procedures were not followed, and the NVRAM battery dis-

charged, transactions might have been lost; to prevent any potential problems,

the node will not mount the file system.

11.5.2 Scalable Rebuild

OneFS does not rely on hardware RAID either for data allocation, or for

reconstruction of data after failures. Instead OneFS manages protection of

file data directly, and when a failure occurs, it rebuilds data in a parallelized

fashion. OneFS is able to determine which files are affected by a failure in

constant time, by reading inode data linearly, directly of disk. The set of

affected files are assigned to a set of worker threads that are distributed among

the cluster nodes by the job engine. The worker nodes repair the files in

parallel. This implies that as cluster size increases, the time to rebuild from

Search WWH ::

Custom Search

Home