Database Reference
In-Depth Information
Application
PVFS Client
PVFS Client API
MPI-IO
Tre e
Network
AD_UFS
PVFS Request
Handler
CIOD
PVFS Kernel
Module
Syscall
Forwarder
Compute Node
VFS
I/O Node
Figure 2.7 I/O accesses from the Blue Gene compute node are serialized
and forwarded to a Blue Gene I/O node, where they are passed through the
VFS layer for service. On the ALCF system, high-performance I/O is provided
via the PVFS file system, which in turn performs file system access through
a user-space process.
for applications. The compute kernel (a custom kernel written by IBM) mar-
shals arguments from I/O calls and forwards a request for I/O over the BG/P
tree network to the I/O node associated with that compute node. The I/O
node, running Linux, processes this request and performs the I/O on behalf of
the compute node process by calling the appropriate system calls on the I/O
node. This approach allows for I/O to any Linux-supported file system, such
as an NFS-mounted file system, a Lustre file system, a GPFS file system, or
a PVFS file system. The ALCF system provides PVFS as a high-performance
file system.
The PVFS file system is mounted on I/O nodes using the PVFS kernel
module. The Linux kernel vectors I/O operations to the mounted PVFS vol-
ume to this kernel module. This module, in turn, forwards operations through
a device file to a user-space process. This user process interacts with server
processes on the file servers, which in turn perform block I/O operations on
the DDN storage devices.
2.3.1.3
Tolerating Failures
The combination of PVFS and Linux-HA (high availability) heartbeat soft-
ware makes an excellent parallel file system solution for Blue Gene with respect
to tolerating failures. The Blue Gene I/O nodes are the real PVFS clients in
this system; and if an I/O node fails, only the compute nodes attached to
that I/O node lose connectivity. Because PVFS clients do not hold file system
state, the failure of an I/O node has no impact on other I/O nodes, allowing
jobs associated with other I/O nodes to continue operation.
If a server fails, the heartbeat software will detect this failure through the
use of quorum. The failed server will be forcefully shut down using intel-
ligent platform management interface (IPMI) power controls, ensuring that
Search WWH ::




Custom Search