High Throughput Data Movement - Scientific Data Management

Database Reference

In-Depth Information

The early version of this code, called XGC0, 7 is already producing very in-

formative results that fusion experimentalists are beginning to use to validate

against experiments such as DIII-D and NSTX. This requires loose coupling of

the kinetic code, XGC0, 7 with GTC and other simulation codes. It is critical

that we monitor the XGC0 simulation results and generate simple images that

can be selected and displayed while the simulation is running. Further, this

coupling is tight, that is, with strict space and time constraints, and the data

movement technologies must be able to support such a coupling of these codes

while minimizing programming effort. Automating the end-to-end process of

configuring, executing, and monitoring of such coupled-code simulations, us-

ing high-level programming interfaces and high-throughput data movement is

necessary to enable scientists to concentrate on their science and not worry

about all of the technologies underneath.

Clearly a paradigm shift must occur for researchers to dynamically and ef-

fectively find the needle in the haystack of data and perform complex code

coupling. Enabling technologies must make it simple to monitor and couple

codes and to move data from one location to another. They must empower

scientists to ask “what if” questions and have the software and hardware

infrastructure capable of answering these questions in a timely fashion. Fur-

thermore, effective data management is not just becoming important—it is

becoming absolutely essential as we move beyond current systems into the

age of exascale computing. We can already see the impact of such a shift in

other domains; for example, the Google desktop has revolutionized desktop

computing by allowing users to find information that might have otherwise

gone undetected. These types of technologies are now moving into leadership-

class computing and must be made to work on the largest analysis machines.

High-throughput end-to-end data movement is an essential part of the solu-

tion as we move toward exascale computing. In the remainder of the chapter,

we present several efforts toward providing high-throughput data movement

to support these goals.

The rest of this chapter will focus on the techniques that the authors have

developed over the last few years for high-performance, high-throughput data

movement and processing. We begin the next section with a discussion of

the Adaptable IO System (ADIOS), and show how this can be extremely

valuable to application scientists and lends itself to both synchronous and

asynchronous data movement. Next, we describe the Georgia Tech DataTap

method underlying ADIOS, which supports high-performance data movement.

This is followed with a description of the Rutgers DART (decoupled and asyn-

chronous remote data transfers) method, which is another method that uses

remote direct memory access (RDMA) for high-throughput asynchronous data

transport and has been effectively used by applications codes including XGC1

and GTC. Finally, we describe mechanisms, such as autonomic management

techniques and in-transit data manipulation methods, to support complex

operations over the LAN and WAN.

Scientific Data Management

Search WWH ::

Custom Search

Home