Texas Advanced Computing Center - High Performance Parallel I/O

Hardware Reference

In-Depth Information

7.2.2 Parallel File Systems|A Shared Resource

Often, ideal measurements, taken under ideal conditions, are the type that

are used to quote file system performance for a given system. However, from a

user perspective, it is important to understand that these peak numbers may

not be indicative of the level of performance they can expect to achieve in

a multi-user, production system environment. Indeed, on most modern HPC

systems, the file system is truly a shared resource; one in which the instan-

taneous performance is influenced by how many users are accessing the file

system and what type of workloads they are performing. Consequently, when

estimating application runtime requirements, users are recommended to pad

the I/O portion of their code to account for runtime variability introduced by

using a shared I/O resource. To quantify the level of variability on a system

with thousands of users, daily performance rates were sampled across two dif-

ferent intervals during the rst 8 months of Stampede's production. The rst

sampling interval corresponds to measurements taken immediately after the

system went into production in January 2013 (and includes approximately two

months' worth of data). These tests were submitted daily as normal user jobs

into the Simple Linux Utility for Resource Management (SLURM) queuing

system [3]. Although they were submitted at the same time each day, they

ran at different times depending on the availability of sucient resources to

schedule the job. Each job wrote 4 TB of data from 512 hosts to individual

files, and two types of runtime performance were monitored: aggregate and

throughput. The aggregate performance number corresponds to the total time

required to complete writing all 4 TB of data. Since it is aggregated across

all writes, the value can be reduced in the case where one or more OSTs are

slower than others due to usage by the other system users. In contrast, the

throughput performance number is based on the time required for each client to

write only its portion of the 4-TB total. This value is generally higher than the

aggregate performance number as the measured timings are not synchronized

across all writes.

Figure 7.5(a) presents a time history of the daily I/O results beginning in

January 2013 and extending through mid-March. From these results, the level

of variability possible on a shared I/O resource can be seen, particularly in

the aggregate performance results where the difference can vary by more than

a factor of two. The average aggregate performance over the 65 measurements

during this period was 110 GB/s, but a general decline is observed over this

initial period after the system entered production. Although system utilization

was ramping up during this time frame, which explains some of the decline as

more users accessed the shared I/O resource, another contributing factor was

the fragmentation associated with the file system filling up. Figure 7.6 shows

the corresponding growth in usage on both theWORKandSCRATCHle

systems during the first three months of operation and illustrates a rapid in-

crease in usage to over 1 PB in approximately two months forSCRATCH.

This fragmentation is another common contributor to the difference a typical

High Performance Parallel I/O

Search WWH ::

Custom Search

Home