Hardware Reference
In-Depth Information
The interconnect between the controllers and the drive drawers is a pair of
redundant 3-Gbps Serial Attached SCSI (SAS) links. ALCF uses DDNs 8+2P
data protection scheme which implements RAID 3 (byte striping) with 2 par-
ity bytes. 1 The parity groups are distributed vertically through the drawers.
If you consider the array to be a 3D matrix of hard drives, with the x and
y dimensions the rows and columns inside the drawer, and the z dimension
going vertically up and down the drawers, then the 10 hard drives that are
in the same x and y coordinates in each drawer form a parity group. This ar-
rangement means that two entire drawers can be lost without the loss of data.
Another useful data protection feature of these arrays is parity verification on
read, which DDN refers to as \SATAsure." Since the ASIC can generate the
parity at line rates on reads, the parity is read from the parity drives, but it
is also recalculated in the controller and compared. If the reads do not match,
the data will be either corrected if possible or flagged if not, preventing what
is referred to as \silent data corruption." In the ALCF conguration, each
controller is the primary for half of the LUNs (a group of drives treated as
a single storage resource), but it can see all the LUNs and can serve data
from the non-primary LUNs (with a small performance penalty) should the
other controller fail. Each controller also has four external facing Double Data
Rate (DDR or 20 Gbps) Infiniband ports for connection to the file servers.
These ports are directly connected to the servers (not via a switch). Any
trac between servers is handled over the converged Myrinet and Ethernet
network.
The initial Intrepid design called for four 2U IBM x3650s, each with dual
quad core processors and 12 GB of RAM, per couplet, with each server driv-
ing two Infiniband ports. However, a review of motherboard block diagrams
along with speeds and feeds led to question if the servers would not be able to
sustain the required I/O throughput. For this reason, the design was modified
to use eight 1U IBM x3550s, each with a single quad core processor and 8 GB
of RAM, driving a single Infiniband port. This change doubled the effective
motherboard I/O bandwidth and had the additional advantage of increasing
the aggregate RAM available for file server cache, without changing the rack
space required, though it did double the number of hosts to manage and the
number of Gigabit Ethernet ports required. Physically, the system was de-
signed in three rack storage cells. Each cell consisted of two DDN storage
arrays, with a server rack in between holding the 16 file servers, gigabit Eth-
ernet switches for management, and in most cases, a large 512-port Myrinet
switch.
Because the Blue Gene/P came with 10 GigE built on-board, there was
a requirement for at least 640 ports of 10 GigE. Given that requirement, a
converged solution was chosen. However, 10G Ethernet was relatively new
in 2007 and the port cost was extremely high. Therefore, a hybrid solution
1 This is typically referred to as RAID 6 because people have taken to using that term
generically for any data protection scheme that can survive two drive failures.
 
Search WWH ::




Custom Search