PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

18-hour run aborted because one CPU crashed is unacceptable, especially when

one such failure is to be expected every week. Thus large MPPs always have spe-

cial hardware and software for monitoring the system, detecting failures, and

recovering from them smoothly.

While it would be nice to study the general principles of MPP design now, in

truth, there are not many principles. When you come right down to it, an MPP is a

collection of more-or-less standard computing nodes connected by a very fast

interconnect of the types we have already examined. So instead, we will now look

at two examples of MPPs: BlueGene/P and Red Storm.

BlueGene

As a first example of a massively parallel processor, we will now examine the

IBM BlueGene system. IBM conceived this project in 1999 as a massively parallel

supercomputer for solving computationally intensive problems in, among other

fields, the life sciences. For example, biologists believe that the three-dimensional

structure of a protein determines its functionality, yet computing the 3D structure

of one small protein from the laws of physics took years on the supercomputers of

that period. The number of proteins found in human beings is over half a million.

Many of them are extremely large and their misfolding is known to be responsible

for certain diseases (e.g., cystic fibrosis). Clearly, determining the 3D structure of

all the human proteins would require increasing the world's computing power by

many orders of magnitude, and modeling protein folding is only one problem that

BlueGene was designed to handle. Equally complex challenges in molecular dy-

namics, climate modeling, astronomy, and even financial modeling also require

orders of magnitude improvement in supercomputing.

IBM felt that there was enough of a market for massive supercomputing that it

invested $100 million to design and build BlueGene. In November 2001, Liver-

more National Laboratory, run by the U.S. Department of Energy, signed up as a

partner and first customer for the first version of the BlueGene family, called Blue-

Gene/L .

In 2007,

IBM deployed the second generation of

the BlueGene

supercomputer, called the BlueGene/P , which we detail here.

The goal of the BlueGene project was not just to produce the world's fastest

MPP, but to also to produce the most efficient one in terms of teraflops/dollar, ter-

aflops/watt, and teraflops/m 3 . For this reason, IBM rejected the philosophy behind

previous MPPs, which was to use the fastest components money could buy. In-

stead, a decision was made to produce a custom system-on-a-chip component that

was to run at a modest speed and low power in order to produce a very large ma-

chine with a high packing density. The first BlueGene/P was delivered to a Ger-

man university in November 2007. The system contained 65,536 processors, and it

was capable of 167 teraflops/sec. When deployed it was the fastest computer in

Europe, and the sixth fastest computer in the world. The system was also regarded

as one of the most computationally power-efficient supercomputers in the world,

Structured Computer Organization

Search WWH ::

Custom Search

Home