Hardware Reference
In-Depth Information
Once completed, this change reduced the failures attributed to GPFS back
to levels similar to other sources of failure. This same strategy was used on
Mira for allocating the number of quorum and manager nodes. ALCF saw a
much more stable and predictable system upon initial testing.
4.4.2 PVFS
On Intrepid, ALCF also deployed PVFS version 2.6 and eventually ended
up at version 2.8.2. PVFS was set up in high-availability mode using Linux-
ha. The same storage hardware that GPFS was using was also being used
by PVFS. The PVFS file system was deployed about six months after the
primary GPFS file system was made available. This initial delay hampered
user adoption of this file system since all of the initial users already had
terabytes of data on GPFS.
The most important finding to ALCF was that both GPFS and PVFS
took considerable amounts of time and effort to tune. Though both parallel
file systems are quite different, the same basic principles were present when
tuning performance or resiliency characteristics.
Although PVFS was not widely used, it was important for the ALCF be-
cause it allowed them to take GPFS oine, allowing Intrepid to run with
users switching over to PVFS. This critical success led ALCF to design mul-
tiple distinct file systems for Mira, although both are GPFS.
4.4.3 Libraries
ALCF supports a number of compute and I/O libraries for user codes.
They selected these libraries based on what was most popular with respect
to their users. ALCF builds and supports this small set of libraries, and tries
to make the libraries as effective on the Blue Gene architecture as possible.
Table 4.1 gives a listing of the I/O libraries ALCF has available for both Mira
and Intrepid.
Although ALCF provided several I/O libraries, POSIX was still the dom-
inant API of choice for I/O. Figure 4.3 shows the break down on interface
TABLE 4.1: Common I/O libraries at ALCF.
Library
Version
Library
Version
ADIOS
1.6.0
HDF5
1.6.6
HDF5
1.8.0
HDF5
1.8.10
Parallel netCDF
1.0.2
Parallel netCDF
1.0.3
Parallel netCDF
1.3.1
Parallel netCDF
1.4.0
netCDF
3.6.2
netCDF
4.0
MOAB
4.1
MOAB
4.5
MOAB
4.6
SILO
4.8
 
Search WWH ::




Custom Search