Hardware Reference
In-Depth Information
this system was to implement the use of compute and I/O nodes for check-
point/restart purposes. Storage network trac is routed to/from the storage
systems that are connected to the cluster's internal network and the facil-
ity local-area network. This large Linux cluster was deployed with routing
though I/O nodes using TCP/IP via Myrinet to Gigabit Ethernet and onto
disk storage. That I/O node routing mechanism was designed in 2001 to sup-
port massive parallelism to enable any number of I/O nodes and Ethernet
switches to be used with extensibility. Further, the ability to route around
bad I/O nodes and bad Ethernet switches or links was also contemplated
(though this capability was not added until later).
In 2003 LANL began planning its second large Linux cluster. By using
the same SAN design which eventually became known as the Parallel Scal-
able Backbone (PaScalBB) [4], they implemented the first large-scale sharing
of a global parallel file system between two supercomputers (see Figure 6.1.)
This was done using Ethernet technology because in 2002, Ethernet was the
only technology available that was stable and scalable enough to be used for
storage access. At this time, Myrinet was not yet usable for storage network-
ing and FC was not scalable enough. It is important to note that this PaS-
calBB was utilized before IB had been invented, which is commonly used for
large-scale SANs. The routing, dead I/O node detection, load balancing, and
failover characteristics of PaScalBB have not been surpassed by any design
to date. And although IB is not as robust as Ethernet SAN interconnects, IB
is more cost effective and has therefore become the default media connection
for SANs.
6.2.3 Global Parallel Scratch File Systems
In the early 2000s, LANL planned for a very large Linux Myrinet cluster.
At this time, there were not many choices for parallel file systems that would
work on a large Linux cluster. IBM's GPFS [10] was only supported then
on IBM clusters. Lustre [11] was being developed via the DOE Advanced
Strategic Computing Initiative (ASCI) Path Forward program, which was
being guided by the Sandia, Lawrence Livermore, and Los Alamos National
Laboratories. There was work to bring forth Lustre Lite, which was an early
version of Lustre with only partial functionality, but Lustre Lite was not quite
ready when LANL needed the file system. The ability to use a single instance
of Lustre as a site-wide global parallel file system would not come along until
much later. Instead, Panasas [9], a new company that worked with LANL
and DOE on their parallel file systems (starting in the 2000s), was the most
mature project. In 2002, Panasas won an open request for proposal for a site-
wide shared global parallel file system, and Panasas has been in use at LANL
as a site-wide global parallel file system for a decade.
There were several Panasas features that helped LANL immensely in the
early days. One such advantage was the ability to connect Panasas to PaS-
calBB in such a way that not every I/O node needed to talk directly to every
 
Search WWH ::




Custom Search