Hardware Reference
In-Depth Information
TABLE 2.2: Amount of archived data at NERSC in 2013 for major systems
or categories of systems.
Client System Name
Total PB Moved
Hopper
6.67 PB
Genepool, Mendel
2.75 PB
DTN
2.48 PB
External to NERSC
0.57 PB
Carver
0.48 PB
PDSF
0.20 PB
NGF
0.03 PB
2.3 Workflows, Workloads, and Applications
At the highest workflow levels, users tend to utilize the local scratch file
system for job input and output. The local scratch file systems have the least
contention for bandwidth and latency. However, if they are utilizing more than
one computational system at the Center, they may make a trade-off for the
higher capacity or cross-system availability generally offered by the NGF sys-
tems to improve overall time to completion. Generally, the parallel file systems
at NERSC complete hundreds of terabytes of I/O throughout an average day
at the facility. The active archive handles between 50{100 TB of parallel I/O
on a typical day, 30% of which are read operations from a combination of local
scratch and NGF file systems. See Table 2.2 for information on the source and
amount of data moving to and from the HPSS archive at NERSC.
Data at NERSC falls into two main categories: simulation and experimen-
tal. For the past decade, simulation data proved to be the most demanding
on compute and storage system resources at the Center. Recently, experimen-
tal data from high-powered instruments such as the Large Hadron Collider
or genomic sequencers present new challenges to NERSC's data systems and
services. Simulation data can be regenerated by rerunning the simulation,
but most experimental data cannot be regenerated due to cost or instrument
changes. JGI, a partner of NERSC since 2010, has been sequencing and an-
alyzing over 50 terabases 4 worth of experimental data a year. The scale of
their demands calls for a parallel file system, but their workloads are a mix of
parallel and serial I/O. The light sources and other experimental facilities are
presenting significant challenges for storage as well. Instrument data rates are
high and increasing rapidly, demanding very capable data acquisition systems
that need to be local to the instrument to achieve success. NERSC provides
4 Terabase is a unit of measure involving the amount of data needed to represent a given
amount of genetic material; specifically, it is the genetic sequence data equivalent to 10 12
base genetic pairs.
 
Search WWH ::




Custom Search