Hardware Reference
In-Depth Information
from Myricom, Inc. was chosen. Myricom had two characteristics that made
them attractive for this application. First, the relatively high port density,
with 512 10 GigE ports in a 21U enclosure. The second was a brand-new
technology, referred to as their \2Z" technology, which translated between
10 GigE frames and 10-Gb Myrinet frames at line rate. Because Myrinet
is source routed, meaning that the sender (or the 2Z chip when the source is
Ethernet) determines the route through the network and prepends the path on
the header, the switches can be relatively simple, fast, and inexpensive relative
to 10 GigE because the switches strip off the addresses, look at the next hop
toward the destination address, and forward data packets accordingly. This is
often referred to as cut-through routing.
Two different parallel file systems are used on Intrepid and they share
the same disk and file servers. The primary file system, used for home and
the larger of the two project le systems, is IBM's GPFS. The second le
system is the Parallel Virtual File System (PVFS). Each file server runs both
a GPFS server daemon and a PVFS server daemon, and the storage arrays
are partitioned, such that each disk has one or more GPFS partitions and one
or more PVFS partitions. While this arrangement does force the hard drive
to move the heads back and forth between the GPFS and PVFS partitions
when simultaneous writes are occurring, PVFS is not heavily used and HPC
workloads tend to be bursty, so the impact was considered acceptable. PVFS is
primarily used by file systems researchers for performance comparisons since
it is uncommon and beneficial to have two parallel file systems running on
identical hardware.
There are 16 arrays and each array has 2 couplets, with 4 IB ports and
1 server directly attached to each IB port for a total of 128 file servers. The
file system setup is grouped together in what GPFS calls an NSD (network
shared disk) cluster. Each group of 8 file servers is a redundancy group. In
theory, as long as one of those 8 servers was up on each storage array and
at least 5 of them were quorum managers, the file system could still operate,
though in practice, the performance would be so bad that it would not be
useful, and would more likely lack resources, particularly RAM, resulting in
the Linux Out of Memory (OOM)|killing a critical process and causing the
file system to go down completely. This of course is an extreme case, but it is
not uncommon to have one or two of the eight servers down without problems.
The 128 file servers that run GPFS also run the PVFS servers.
4.3.2 Mira: ALCF Blue Gene/Q System
The design of the Mira storage system is very similar to the Intrepid storage
system. For comparison purposes, specific numbers, speeds, and feeds will
be provided where dierent. Figure 4.2 shows a system diagram of ALCF's
second-generation HPC center, ALCF2.
Like all members of the Blue Gene family, Mira has dedicated I/O nodes.
Mira has 49,152 nodes, and 384 I/O nodes, for a ratio of 128:1|twice that
 
Search WWH ::




Custom Search