Hardware Reference
In-Depth Information
earthquakes, social networking, and weather. 1 The Blue Waters system con-
sists of six primary subsystems the Cray XE6/XK7 computation and analysis
subsystem, external server complex, the Cray Sonexion TM on-line storage, the
HPSS near-line storage, wide-area networking, and cyber-protection subsys-
tems. Four of the six subsystems are discussed below.
3.1 The Blue Waters Computational and Analysis
Subsystems
The key feature of Blue Waters is the very balanced strategy for the system
beginning with the 22,640 XE6 compute node, 4,224 XK7 compute nodes, 582
Lustre Network (or Lnet) I/O router nodes, and 202 other service nodes [2].
Each XE6 node has two AMD Interlagos processor modules, 2 fast access to
4 GB of memory per processor core, and the high-speed 3D Torus intercon-
nect. The configuration is based upon the tightly coupled Cray XE6/XK7
system 3,4 system complete with powerful parallel storage and file system ca-
pabilities. The Cray XE6/XK7 is a hybrid system featuring AMD socket G34
Interlagos TM processors for x86 compute performance and NVIDIA Kepler
K20X GPUs with powerful acceleration capabilities. The Cray XE6/XK7
balance was determined by the mission of Blue Waters in consultation with
the major science teams and an in-depth analysis of the known SETs assess-
ing their current and planned experimental and production use of accelerated
computing. 5 Of the nodes, 84% are XE6 all-CPU AMD Interlagos processor
nodes, and 16% are the XK7 nodes with one AMD Interlagos processor mod-
ule and one NVIDIA Kepler GPU in a single node using the Cray-designed
Gemini router.
The Cray XE6/XK7 system uses the latest AMD Opteron Series 6000
model 6276 \Interlagos" in both the Cray XE6 compute node and the Cray
XK7 accelerator nodes. The Interlagos socket is a dual-die multi-chip module.
The Interlagos processor has a peak performance of 156.8 GFLOPS per socket
at 2.45 GHz. The Interlagos processor can be used as two 4-core modules with
each core supporting two threads giving a total of eight results per clock per
fused multiply-add. Each processor socket has four HyperTransport TM links,
1 See the Blue Waters portal ( http://bluewaters.ncsa.illinois.edu ) for complete de-
scriptions of the projects.
2 http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf
3 http://www.cray.com/Products/XE/Specifications.aspx
4 http://www.cray.com/Products/XK6/Specifications.aspx
5 If the Blue Waters funding was invested to target the highest possible peak performance
by populating as many compute racks as possible with the NVIDIA GPUs, the peak perfor-
mance of Blue Waters would approach 50 PFLOPS, but with only half the memory, which
would have made many of the SET goals unfeasible.
 
Search WWH ::




Custom Search