High Throughput Data Movement - Scientific Data Management

Database Reference

In-Depth Information

extracted data from live simulations to remote services. In the previous sec-

tions we talked about services that worked on the local area network, and in

this section we discuss services that must work over the wide area network.

For example, in the context of the DOE SciDAC CPES fusion simulation

project, 29 a typical workflow consists of coupled simulation codes—the edge

turbulence particle-in-cell (PIC) code (GTC) and the microscopic MHD code

(M3D)—running simultaneously on thousands of processors at various super-

computing centers. The data produced by these simulations must be streamed

to remote sites and transformed along the way, for online simulation monitor-

ing and control, simulation coupling, data analysis and visualization, online

validation, and archiving. Wide-area data streaming and in-transit processing

for such a workflow must satisfy the following constraints: (1) Enable high-

throughput, low-latency data transfer to support near real-time access to the

data. (2) Minimize related overhead on the executing simulation. Since the

simulation is long running and executes in batch for days, the overhead due

to data streaming on the simulation should be less than 10% of the simulation

execution time. (3) Adapt to network conditions to maintain desired quality

of service (QoS). The network is a shared resource and the usage patterns vary

constantly. (4) Handle network failures while eliminating data loss. Network

failures can lead to buffer overflows, and data has to be written to local disks

to avoid loss. However, this increases overhead on the simulation and the data

is not available for real-time remote analysis and visualization. (5) Effectively

manage in-transit processing while satisfying the above requirements. This is

particularly challenging due to the heterogeneous capabilities and dynamic

capacities of the in-transit processing nodes.

5.3.1 An Infrastructure for Autonomic Data Streaming

The data streaming service described in this section is constructed using the

Accord programming infrastructure, 30 − 32 which provides the core models and

mechanisms for realizing self-managing Grid services. These include auto-

nomic management using rules as well as model-based online control. Accord

extends the service-based Grid programming paradigm to relax static (defined

at the time of instantiation) application requirements and system/application

behaviors and allows them to be dynamically specified using high-level rules.

Further, it enables the behaviors of services and applications to be sensitive

to the dynamic state of the system and the changing requirements of the

application, and to adapt to these changes at runtime. This is achieved by

extending Grid services to include the specifications of policies (in the form

of high-level rules) and mechanisms for self-management, and providing a

decentralized runtime infrastructure for consistently and eciently enforcing

these policies to enable autonomic self-managing functional interaction and

composition behaviors based on current requirements, state, and execution

context.

Search WWH ::

Custom Search

Home