Hardware Reference
In-Depth Information
6.1.1 Facilities and Environments
The current computing environment at LANL consists of three computing
facilities with a total of 104,000 ft 2 of raised computing floor and over 100,000
ft 2 of indoor mechanical space. The site currently has 28 megawatts (MW)
available for the HPC computing environment with active plans for about
50 MW total. The computational capability is split into three environments:
secure, open, and compartmented at 85%, 13%, and 2%, respectively.
The computing environment at LANL contains three primary types of
machines: capacity, capability, and advanced architecture machines. Capac-
ity machines are commercial, off-the-shelf Linux clusters ranging from 10,000
to 30,000 standard AMD or Intel cores with a subset of machines with ac-
celerators attached and typically 2 GB of dynamic random-access memory
(DRAM) per core interconnected by IB. Capacity machines are used for jobs
in the 100{10,000 core range. Typical applications include 1D and 2D physics
problems, and validation of codes and runs. Capability machines are 100,000{
200,0000 standard AMD or Intel cores with typically 2 GB of DRAM per core
interconnected by a proprietary interconnect. These machines are reserved for
jobs that use nearly the entire machine for long periods of time (weeks to
months) doing 3D physics calculations. Advanced architecture machines have
some advanced features, like Roadrunner, the first machine to reach petaflop-
class computing, made up of about 14,000 AMD cores and 14,000 cell cores,
each with eight special processing cores interconnected by IB. All supercom-
puters contain compute nodes and I/O nodes with a ratio of compute to I/O
nodes from 20:1 to 100:1. The I/O nodes serve as routers between the machine
interconnect and SAN.
In addition to the capacity, capability, and advanced architecture ma-
chines, LANL also has visualization clusters that are a few thousand cores
with GPUs that are used for parallel hardware rendering and compositing to
drive the many large power wall theaters, immersive cave environments, and
end-user 3D workstations.
6.2 I/O Hardware
The following sections discuss details of the LANL storage environment.
Many of these sections discuss early adopter methods and deployments in-
cluding large-scale use of I/O nodes for data storage routing, and first use of a
scalable SAN to share a globally visible parallel file system between multiple
large Linux supercomputers. This section will also discuss how struggles with
unaligned data patterns created issues and opportunities for more innovative
implementations.
 
Search WWH ::




Custom Search