Database Reference
In-Depth Information
the server is no longer accessing storage. Once this shutdown is accomplished,
the storage for that file system is mounted on one of the other active servers
and a new PVFS server process is started, filling in for the failed server until
the original server is brought back online. Using heartbeat clusters of eight
nodes, up to three server failures can be tolerated per group of eight servers
before the PVFS file system would become unavailable.
2.3.2 PanFS on Roadrunner
On June 9, 2008, the Department of Energy (DOE) announced that Los
Alamos National Laboratory's Roadrunner supercomputer was the first com-
puter to exceed a petaFLOP, or 1,000 trillion operations per second, of sus-
tained performance according to the rules of the top500.org benchmark. Road-
runner will be used to certify that the U.S. nuclear weapons stockpile is reli-
able, without conducting underground nuclear tests. Roadrunner will be built
in three phases and will cost about $100 million. It is the first “hybrid” super-
computer, in that it achieves its performance by combining 6,562 dual-core
AMD Opteron (x66 class) processors and 12,240 IBM Cell Broadband En-
gines, which originated from the designs of Sony Playstation 3 video game
machines. It runs the Linux operating system in both Opteron and Cell pro-
cessors, contains 98 TB of memory, is housed in 278 racks occupying 5,200
square feet, and is interconnected node to node with 10 Gbps InfiniBand and
node to storage with 10 Gbps Ethernet. One of the most surprising results is
that the world's fastest computer (at that time) was also the third most power
ecient, according to the green500.org list of supercomputers; it achieves 437
million operations per watt consumed, whereas computers previously topping
the speed list were 43rd and 499th on the green list.
Roadrunner is organized as subclusters called compute units, each with 12
I/O nodes routing storage tra c between the clusters and Panasas storage
clusters. All Roadrunner compute nodes are diskless; their operating system
runs from a RAMdisk with external storage access using Panasas DirectFlow
and NFS. The attached Panasas storage contains over 200 shelves of Panasas
storage shared over Roadrunner's phases and older Los Alamos supercom-
puters Lightning and Bolt. Each shelf contains 10 StorageBlades and one
DirectorBlade, for a total of over 3 PB, 2,000 object servers, 4,000 disks, and
200 metadata managers.
Figure 2.8 shows the storage architecture of the Panasas storage connected
to Roadrunner and shared with other supercomputers in Los Alamos's “red”-
level secure computing facility. All supercomputers have access to all Panasas
storage in this facility, with the amount of bandwidth available to each clus-
ter determined by the I/O node resources on the supercomputer. While most
supercomputers have dedicated storage directly attached to the cluster's I/O
nodes, Los Alamos shares storage resources to reduce copying overhead and
reduce the cost of scaling storage bandwidth with each new cluster. Their stor-
age architecture, called PaScalBB (parallel and scalable server I/O backbone
Search WWH ::




Custom Search