Hardware Reference
In-Depth Information
I/O and service node
Switch
Compute node
28 Classified
cabinets
(2688Opterons)
52Switchable cabinets
(4992 Opterons )
28 Unclassified
cabinets
(2688 Opterons)
120 TB
storage
120 TB
storage
Figure 8-41. The Red Storm system as viewed from above.
Everything is housed in a new 2000-m 2 custom building. The building and site
have been designed so that the system can be upgraded to as many as 30,000
Opterons in the future if required. The compute nodes draw 1.6 megawatts of
power; the disks draw another megawatt. Adding in the fans and air conditioning,
the whole thing uses 3.5 MW.
The computer hardware and software cost $90 million. The building and cool-
ing cost another $9 million, so the total cost came in at just under $100 million, al-
though some of that was nonrecurring engineering cost. If you want to order an
exact clone, $60 million would be a good number to keep in mind. Cray intends to
sell smaller versions of the system to other government and commercial customers
under the name X3T.
The compute nodes run a lightweight kernel called catamount . The I/O and
service nodes run plain vanilla Linux with a small addition to support MPI (dis-
cussed later in this chapter). The RAS nodes run a stripped-down Linux. Exten-
sive software from ASCI Red is available for use on Red Storm, including CPU
allocators, schedulers, MPI libraries, math libraries, as well as the application pro-
grams.
With such a large system, achieving high reliability is essential. Each board
has a RAS processor for doing system maintenance and there are special hardware
facilities as well. The goal is an MTBF (Mean Time Between Failures) of 50
hours. ASCI Red had a hardware MTBF of 900 hours but was plagued by an oper-
ating-system crash every 40 hours. Although the new hardware is much more re-
liable than the old, the weak point remains the software.
For more information about Red Storm, see Brightwell et al. (2005, 2010).
A Comparison of BlueGene/P and Red Storm
Red Storm and BlueGene/P are comparable in some ways but different in oth-
ers, so it is interesting to put some of the key parameters next to each other, as
presented in Fig. 8-42.
 
Search WWH ::




Custom Search