High Throughput Data Movement - Scientific Data Management

Database Reference

In-Depth Information

The example described next, and the driver for the research described in

this chapter, is the gyrokinetic toroidal code (GTC) 4

fusion simulation that

scientists ran on the 250

Tflop computer at Oak Ridge National Laboratory

(ORNL) during the first quarter of 2008. GTC is a state-of-the-art global

fusion code that has been optimized to achieve high eciency on a single com-

puting node and nearly perfect scalability on massively parallel computers. It

uses the particle-in-cell (PIC) technique to model the behavior of particles

and electromagnetic waves in a toroidal plasma in which ions and electrons

are confined by intense magnetic fields. One of the goals of GTC simulations is

to resolve the critical question of whether or not scaling in large tokamaks will

impact ignition for ITER.

In order to understand these effects and validate the simulations against

experiments, the scientists will need to record enormous amounts of data.

The particle data in the PIC simulations is five-dimensional, containing three

spatial dimensions and two velocity dimensions. The best estimates are that

the essential information can be 55 GB of data written out every 60 sec-

onds. However, since each simulation takes 1.5 days, and produces roughly

150 TB of data (including extra information not included in our previous cal-

culation), it is obvious that there will not be enough disk space for the next

simulation scheduled on the supercomputer unless the data is archived on the

high-performance storage system, HPSS, while the simulation is running. 5

Moving the data to HPSS, running at 300 MB/s still requires staging simu-

lations, one per week. This means that runs will first need to move the data

from the supercomputer over to a large disk. From this disk, the data can

then move over to HPSS, at the rate of 300 MB/s.

Finally, since human and system errors can occur, it is critical that scien-

tists monitor the simulation during its execution. While running on a system

with 100,000 processors, every wasted hour results in 100,000 wasted CPU

hours. Obviously we need to closely monitor simulations in order to conserve

the precious resources on the supercomputer, and the time of the applica-

tion scientist after a long simulation. The general analysis that one would do

during a simulation can include taking multidimensional FFTs (fast fourier

transforms) and looking at correlation functions over a specified time range,

as well as simple statistics. Adding these routines directly to the simulation

not only complicates the code, but it is also dicult to make all of the extra

routines scale as part of the simulation. To summarize, effectively running

the large simulations to enable cutting-edge science, such as the GTC fusion

simulations described above, requires that the large volumes of data gener-

ated must be (a) moved from the compute nodes to disk, (b) moved from

disk to tape, (c) analyzed during the movement, and finally (d) visualized,

all while the simulation is running. Workflow management tools can be used

very effectively for this purpose, as described in some detail in Chapter 13.

In the future, codes like GTC, which models the behavior of the plasma in

the center of the device, will be coupled with other codes, such as X-point

gyrokinetic guiding center (XGC1), 6

+

which models the edge of the plasma.

Scientific Data Management

Search WWH ::

Custom Search

Home