Los Alamos National Laboratory - High Performance Parallel I/O

Hardware Reference

In-Depth Information

this system was to implement the use of compute and I/O nodes for check-

point/restart purposes. Storage network trac is routed to/from the storage

systems that are connected to the cluster's internal network and the facil-

ity local-area network. This large Linux cluster was deployed with routing

though I/O nodes using TCP/IP via Myrinet to Gigabit Ethernet and onto

disk storage. That I/O node routing mechanism was designed in 2001 to sup-

port massive parallelism to enable any number of I/O nodes and Ethernet

switches to be used with extensibility. Further, the ability to route around

bad I/O nodes and bad Ethernet switches or links was also contemplated

(though this capability was not added until later).

In 2003 LANL began planning its second large Linux cluster. By using

the same SAN design which eventually became known as the Parallel Scal-

able Backbone (PaScalBB) [4], they implemented the first large-scale sharing

of a global parallel file system between two supercomputers (see Figure 6.1.)

This was done using Ethernet technology because in 2002, Ethernet was the

only technology available that was stable and scalable enough to be used for

storage access. At this time, Myrinet was not yet usable for storage network-

ing and FC was not scalable enough. It is important to note that this PaS-

calBB was utilized before IB had been invented, which is commonly used for

large-scale SANs. The routing, dead I/O node detection, load balancing, and

failover characteristics of PaScalBB have not been surpassed by any design

to date. And although IB is not as robust as Ethernet SAN interconnects, IB

is more cost effective and has therefore become the default media connection

for SANs.

6.2.3 Global Parallel Scratch File Systems

In the early 2000s, LANL planned for a very large Linux Myrinet cluster.

At this time, there were not many choices for parallel file systems that would

work on a large Linux cluster. IBM's GPFS [10] was only supported then

on IBM clusters. Lustre [11] was being developed via the DOE Advanced

Strategic Computing Initiative (ASCI) Path Forward program, which was

being guided by the Sandia, Lawrence Livermore, and Los Alamos National

Laboratories. There was work to bring forth Lustre Lite, which was an early

version of Lustre with only partial functionality, but Lustre Lite was not quite

ready when LANL needed the file system. The ability to use a single instance

of Lustre as a site-wide global parallel file system would not come along until

much later. Instead, Panasas [9], a new company that worked with LANL

and DOE on their parallel file systems (starting in the 2000s), was the most

mature project. In 2002, Panasas won an open request for proposal for a site-

wide shared global parallel file system, and Panasas has been in use at LANL

as a site-wide global parallel file system for a decade.

There were several Panasas features that helped LANL immensely in the

early days. One such advantage was the ability to connect Panasas to PaS-

calBB in such a way that not every I/O node needed to talk directly to every

High Performance Parallel I/O

Search WWH ::

Custom Search

Home