Database Reference
In-Depth Information
perform different types of data filtering and data transformation, as well as
transmission of data between processes over network links.
EVPath is designed to support a flexible and dynamic computational en-
vironment where stones might be created on remote nodes and possibly re-
locate during the course of the computation. In order to support such an
environment, we use a sandboxed version of C, coupled with a dynamic code-
generation facility to allow native binary transformation functions to be de-
ployed anywhere in the system at runtime. 24 The interface allows for the
specification of data gateways (pass/no-pass) and data transformations (sum
aggregation trees), and calls out to more specialized code (for example, invoca-
tion of a signed, shared library for performing FFTs). From these elements, the
application user can specify in much greater detail how the interaction between
the output of the running code and the data stored for later use should look.
5.2.4.2
Data Workspaces and Augmentation of Storage Services
As a concrete example of the user-driven interfaces that can be provided for
application scientists, it is useful to consider the concept of data workspaces.
In a data workspace, users are provided with an execution model (i.e., a
semitransparent way of creating and submitting batch MPI jobs), along with
a way for specifying the data control networks for how this data should move
and be interpreted while in transit from the computing resource to the storage.
Note that this concept interacts cleanly with the concept of a workflow—it
is a part of a rich transport specification that then feeds the manipulation of
the data once it has reached disk.
As an example of this concept, a team at Georgia Institute of Technol-
ogy has built a data workspace for molecular dynamics applications that can
make synchronous tests of the quality of the data and use that to modify the
priority and even the desirability of moving that data into the next stage of
its workflow pipeline. 17 Specifically, this workspace example modifies a stor-
age service (ADIOS) that the molecular dynamics program invokes. As an
example scenario, consider an application scientist who runs the parallel data
output through an aggregation tree so that there is a single unified dataset
(rather than a set of partially overlapping atomic descriptors), and then un-
dergoes data quality and timeliness evaluation. Raw atomic coordinate data
is compared to a previous graph of nearest neighbors through the evaluation
of a central symmetry function to determine if any dislocations (seed of crack
formation) have occurred in the simulated dataset. The frequency of the data
storage is then changed, in this particular case, dependent on whether the
data is from before, during, or after the formation of a crack, since the data
during the crack formation itself is of the highest scientific value.
Similarly, in Reference 17, data quality can be adapted based on a re-
quirement for timeliness of data delivery—if a particular piece of data is too
large to be delivered within the deadline, user-defined functions can be chosen
autonomically to change data quality so as to satisfy the delivery timeline.
Search WWH ::




Custom Search