Hardware Reference
In-Depth Information
4.3.1 Intrepid: ALCF Blue Gene/P System
ALCF's rst-generation HPC data center, ALCF1, is shown in Figure 4.1.
ALCF1 contains Intrepid, which has 40,960 compute nodes and 640 I/O nodes,
so each I/O node services 64 compute nodes. The I/O nodes are identical to
the compute nodes. When the compute node is inserted into a slot with 10
GigE ports, the node becomes an I/O node.
The project space storage system is formed from 16 Data Direct Networks
(DDN) Silicon Storage Appliance (S2A) 9900 storage arrays. This system is
a single shared file system referenced as /intrepid-fs0 in Figure 4.1, which
is used for large parallel writes from the applications on the Blue Gene. As
implied by the name silicon storage appliance, the controllers are based on cus-
tom ASICs. The ALCF configuration has two redundant controllers, referred
to as a couplet, and 10 drawers, each capable of holding 60 disk drives. These
systems were installed in 2007 and were initially configured with forty-eight
1-TB SATA drives, which was considered sucient to saturate the controller
bandwidth. The raw capacity of this system is given by
drive 48 drivers
TB
drawer 10 drawers
1
couplet 16 couplets = 7; 680 GB;
(4.1)
which after overheads and a small partition kept unallocated for testing, re-
sulted in a usable capacity of 5 PB. In 2010 a capacity upgrade was performed
and an additional three 3-TB hard drives were added to each drawer, providing
an additional usable capacity of 1 PB.
FIGURE 4.1: The ALCF1 HPC system.
 
Search WWH ::




Custom Search