Database Reference
In-Depth Information
extracted data from live simulations to remote services. In the previous sec-
tions we talked about services that worked on the local area network, and in
this section we discuss services that must work over the wide area network.
For example, in the context of the DOE SciDAC CPES fusion simulation
project, 29 a typical workflow consists of coupled simulation codes—the edge
turbulence particle-in-cell (PIC) code (GTC) and the microscopic MHD code
(M3D)—running simultaneously on thousands of processors at various super-
computing centers. The data produced by these simulations must be streamed
to remote sites and transformed along the way, for online simulation monitor-
ing and control, simulation coupling, data analysis and visualization, online
validation, and archiving. Wide-area data streaming and in-transit processing
for such a workflow must satisfy the following constraints: (1) Enable high-
throughput, low-latency data transfer to support near real-time access to the
data. (2) Minimize related overhead on the executing simulation. Since the
simulation is long running and executes in batch for days, the overhead due
to data streaming on the simulation should be less than 10% of the simulation
execution time. (3) Adapt to network conditions to maintain desired quality
of service (QoS). The network is a shared resource and the usage patterns vary
constantly. (4) Handle network failures while eliminating data loss. Network
failures can lead to buffer overflows, and data has to be written to local disks
to avoid loss. However, this increases overhead on the simulation and the data
is not available for real-time remote analysis and visualization. (5) Effectively
manage in-transit processing while satisfying the above requirements. This is
particularly challenging due to the heterogeneous capabilities and dynamic
capacities of the in-transit processing nodes.
5.3.1 An Infrastructure for Autonomic Data Streaming
The data streaming service described in this section is constructed using the
Accord programming infrastructure, 30 32 which provides the core models and
mechanisms for realizing self-managing Grid services. These include auto-
nomic management using rules as well as model-based online control. Accord
extends the service-based Grid programming paradigm to relax static (defined
at the time of instantiation) application requirements and system/application
behaviors and allows them to be dynamically specified using high-level rules.
Further, it enables the behaviors of services and applications to be sensitive
to the dynamic state of the system and the changing requirements of the
application, and to adapt to these changes at runtime. This is achieved by
extending Grid services to include the specifications of policies (in the form
of high-level rules) and mechanisms for self-management, and providing a
decentralized runtime infrastructure for consistently and eciently enforcing
these policies to enable autonomic self-managing functional interaction and
composition behaviors based on current requirements, state, and execution
context.
Search WWH ::




Custom Search