Hardware Reference
In-Depth Information
18-hour run aborted because one CPU crashed is unacceptable, especially when
one such failure is to be expected every week. Thus large MPPs always have spe-
cial hardware and software for monitoring the system, detecting failures, and
recovering from them smoothly.
While it would be nice to study the general principles of MPP design now, in
truth, there are not many principles. When you come right down to it, an MPP is a
collection of more-or-less standard computing nodes connected by a very fast
interconnect of the types we have already examined. So instead, we will now look
at two examples of MPPs: BlueGene/P and Red Storm.
BlueGene
As a first example of a massively parallel processor, we will now examine the
IBM BlueGene system. IBM conceived this project in 1999 as a massively parallel
supercomputer for solving computationally intensive problems in, among other
fields, the life sciences. For example, biologists believe that the three-dimensional
structure of a protein determines its functionality, yet computing the 3D structure
of one small protein from the laws of physics took years on the supercomputers of
that period. The number of proteins found in human beings is over half a million.
Many of them are extremely large and their misfolding is known to be responsible
for certain diseases (e.g., cystic fibrosis). Clearly, determining the 3D structure of
all the human proteins would require increasing the world's computing power by
many orders of magnitude, and modeling protein folding is only one problem that
BlueGene was designed to handle. Equally complex challenges in molecular dy-
namics, climate modeling, astronomy, and even financial modeling also require
orders of magnitude improvement in supercomputing.
IBM felt that there was enough of a market for massive supercomputing that it
invested $100 million to design and build BlueGene. In November 2001, Liver-
more National Laboratory, run by the U.S. Department of Energy, signed up as a
partner and first customer for the first version of the BlueGene family, called Blue-
Gene/L .
In 2007,
IBM deployed the second generation of
the BlueGene
supercomputer, called the BlueGene/P , which we detail here.
The goal of the BlueGene project was not just to produce the world's fastest
MPP, but to also to produce the most efficient one in terms of teraflops/dollar, ter-
aflops/watt, and teraflops/m 3 . For this reason, IBM rejected the philosophy behind
previous MPPs, which was to use the fastest components money could buy. In-
stead, a decision was made to produce a custom system-on-a-chip component that
was to run at a modest speed and low power in order to produce a very large ma-
chine with a high packing density. The first BlueGene/P was delivered to a Ger-
man university in November 2007. The system contained 65,536 processors, and it
was capable of 167 teraflops/sec. When deployed it was the fastest computer in
Europe, and the sixth fastest computer in the world. The system was also regarded
as one of the most computationally power-efficient supercomputers in the world,
 
Search WWH ::




Custom Search