Hardware Reference
In-Depth Information
served over 10 GigE in the case of SAN failures. NERSC is moving toward
a purely IB-based SAN where only the systems with proprietary intercon-
nects will have I/O nodes. The intent is for the IB-based compute systems
to utilize hardware routers to bridge the computational IB and storage IB
networks. This simplifies and centralizes the centerwide parallel file systems
enabling maximum scalability and isolation of problems to either the storage
or compute systems, but not both storage and compute at the same time.
2.2.3 The NERSC Global File Systems
NERSC pioneered the concept of providing a long-term centerwide file
system with its /project file system in the early 2000s. By 2006, the main
goals of the /project file system were realized when users could access the
file system from all computational systems at the facility. Users accepted con-
tended aggregate bandwidth in favor of having a storage resource accessible to
all computational systems at the facility. The centerwide file system enabled
users to focus more on their computational capabilities across several systems
and less on moving data between various storage or compute systems. NERSC
also saw a direct result of having less data move across its internal network
between systems. 3 Today, the /project file system is a primary storage re-
source for enabling science projects to share data between multiple users or
systems. This le system is used for data transfers on the center's data trans-
fer nodes (DTNs), sharing with collaborators through portals on the Science
Gateway Nodes (SGNs), and for data analysis on the mid-range systems. The
file system is a global parallel file system and has a high-bandwidth capacity
oriented with regular increases planned to sustain demand and growth.
Most recently the centerwide file systems demonstrated that they are also
positioned to scale out in support of individual science team needs nearly on
demand. An evident example was enabling the work of the Daya Bay project
that had an unexpected deluge of data in their efforts to determine the mixing
angle in their neutrino-oscillation experiment [2]. They had recently migrated
from a model of local storage that was struggling at scale, to the centerwide
file system. The move provided for easy scale-out (doubling their allocation)
within less than a week's response and ultimately enabled rapid scientic
results directly leading to a scientific discovery.
The centerwide file systems are also well suited for meeting the evolving
needs of data-intensive science communities. NERSC initiated a Data Inten-
sive Computing Pilot [8] with a number of science teams providing 1 PB of
high-bandwidth storage capacity and a compute allocation to enable analysis
of large datasets.
NERSC provides two other main centerwide file systems: global scratch
and global homes. Global scratch provides high bandwidth simultaneously
3 Bulk data movement represents about 50% of the NERSC network trac, so reducing
bulk data movement on the network has a significant effect on responsiveness of systems.
 
Search WWH ::




Custom Search