Hardware Reference
In-Depth Information
up to 5.2 GB/s. The core module has shared fetch/decode units, shared in-
struction cache for each floating-point unit. Each Interlagos socket features
eight \bulldozer" core modules, each capable of eight oating point opera-
tions per clock.
In addition the processor core is capable of running in a mode that shares
the floating point unit between two integer units, which we call integer cores.
In this mode, the AMD processors include an L1 cache for each integer unit
and an L2 cache for each bulldozer core as well as two 6 MB L3 caches shared
by all the cores of each of the two dies. The processors also share the DDR3
internal memory controllers, and an HT3 HyperTransport interface that in-
creases the injection bandwidth between the interconnect network and the
compute processor. The XE6 compute nodes consist of two sockets each with
associated memory and HyperTransport TM interconnection.
The XK7 compute nodes have an Interlagos socket with the associated
memory and one NVIDIA Kepler K20X GPU. The Cray XK7 accelerator
blade is similar to the Cray XE6 compute blade in form factor and placement
of the Gemini network cards. It differs in that each of the four compute nodes
on the blade consists of an AMD processor socket and a connection for a Kepler
GPU (NVIDIA K20X) card. These GPU cards each contain the accelerator
chip and 6 GB of GDDR5 memory and connect to the motherboard with a
high-reliability connector. These cards sit up off the motherboard to allow
the necessary cooling on both sides of the card. The blade will be managed,
monitored, powered, and cooled within the Cray XE6/XK7 infrastructure.
The Blue Waters system consists of 288 computational cabinets arranged in
12 rows, each with 24 cabinets.
The Cray Gemini High Speed Network is based on a custom Gemini router
that connects to two XE or XK nodes with the rest of the system. The HSN
interconnect topology is a 3D torus of dimension X = 24, Y = 24, and Z = 24
with a total injection bandwidth of over 276 TB/s. All torus links run at a
minimum bit toggle rate of 3.125 GHz. Figure 3.1 shows the configuration of
the XE6 and XK7 nodes. The resulting peak bi-directional bandwidth in each
dimension is listed in the Table 3.1. The peak global bandwidth value is twice
as large due to the fact that in all-to-all communication patterns, only half of
the total trac crosses the bi-section in a 3D torus topology. All I/O requests
and trac go across the interconnect from the compute nodes to the Lnet
nodes. Key system performance and configuration information is provided in
Table 3.2.
3.2 Blue Waters On-line Storage Subsystem
Blue Waters provides one of the most intense storage systems in the world
using a combination of on-line and near-line storage devices and media. The
 
Search WWH ::




Custom Search