Hardware Reference
In-Depth Information
7.2.2 Parallel File Systems|A Shared Resource
Often, ideal measurements, taken under ideal conditions, are the type that
are used to quote file system performance for a given system. However, from a
user perspective, it is important to understand that these peak numbers may
not be indicative of the level of performance they can expect to achieve in
a multi-user, production system environment. Indeed, on most modern HPC
systems, the file system is truly a shared resource; one in which the instan-
taneous performance is influenced by how many users are accessing the file
system and what type of workloads they are performing. Consequently, when
estimating application runtime requirements, users are recommended to pad
the I/O portion of their code to account for runtime variability introduced by
using a shared I/O resource. To quantify the level of variability on a system
with thousands of users, daily performance rates were sampled across two dif-
ferent intervals during the rst 8 months of Stampede's production. The rst
sampling interval corresponds to measurements taken immediately after the
system went into production in January 2013 (and includes approximately two
months' worth of data). These tests were submitted daily as normal user jobs
into the Simple Linux Utility for Resource Management (SLURM) queuing
system [3]. Although they were submitted at the same time each day, they
ran at different times depending on the availability of sucient resources to
schedule the job. Each job wrote 4 TB of data from 512 hosts to individual
files, and two types of runtime performance were monitored: aggregate and
throughput. The aggregate performance number corresponds to the total time
required to complete writing all 4 TB of data. Since it is aggregated across
all writes, the value can be reduced in the case where one or more OSTs are
slower than others due to usage by the other system users. In contrast, the
throughput performance number is based on the time required for each client to
write only its portion of the 4-TB total. This value is generally higher than the
aggregate performance number as the measured timings are not synchronized
across all writes.
Figure 7.5(a) presents a time history of the daily I/O results beginning in
January 2013 and extending through mid-March. From these results, the level
of variability possible on a shared I/O resource can be seen, particularly in
the aggregate performance results where the difference can vary by more than
a factor of two. The average aggregate performance over the 65 measurements
during this period was 110 GB/s, but a general decline is observed over this
initial period after the system entered production. Although system utilization
was ramping up during this time frame, which explains some of the decline as
more users accessed the shared I/O resource, another contributing factor was
the fragmentation associated with the file system filling up. Figure 7.6 shows
the corresponding growth in usage on both theWORKandSCRATCHle
systems during the first three months of operation and illustrates a rapid in-
crease in usage to over 1 PB in approximately two months forSCRATCH.
This fragmentation is another common contributor to the difference a typical
 
Search WWH ::




Custom Search