Database Reference
In-Depth Information
from 10 years of experience by their respective authors. Simple and easy to use,
it has been completed with a standardized data extension only recently. This
extension to the native API proposes to expert users to easily handle remote data
and to optimize distributed applications with prefetch, migration or replication
of possibly distant data using multiple asynchronous transfers together with
remote procedure calls on available distributed computing resources.
Based on preliminary experiments[1,5], applications also benefit from multi-
administration sites resources managed by multi-middleware (inherent to inter-
operability provided with the implementation of the API data extension) and
target not only traditional Grids but any distributed platform possibly composed
of resources from the Cloud [6].
In an attempt to simplify and develop interoperability, and to unify previous
works, we propose here a library managing both GridRPC and GridRPC Data
Management APIs. We present an overview of the project architecture, designed
with a very modular prospect, relying on middleware and data manager modules
but also bringing inner data manager capabilities and transfer protocols. Having
in mind not to go too much into details, we highlight here some of its features,
such as the asynchronous requests management and the transfer management,
which involves mapping and scheduling aspects: there is interesting potentiality
for optimization at the data operation level, with scheduling to reduce the com-
pletion time of a data operation when several sources and several destinations
are provided but not necessarily interconnected; and at the workflow/dataflow
level to reduce any [sub part of an] application graph. At the moment, the library
provides modules for the grid middleware Diet and Ninf , and data manager
modules for projects and protocols like Dagda , iRods , webdav (used for web-
based repositories like dropbox, owncloud), ftp and rsync .
The rest of the paper is organized as follows: next section explains the moti-
vations behind the GridRPC DM API and some related work. Section 3 presents
the global design of the implementation, the different issues that the API leads
to and their solution. Section 4 presents some validation experiments and after
explaining some future work directions, we conclude in Section 6.
2 State of the Art
2.1 The GridRPC Data Management API, Summary
The GridRPC DM API [2] introduces the concept of data handle and with it,
several GridRPC data types to provide standardized information, for example
lists of input and output URIs to give the locations of respectively source and
destination [remote] data, with the according protocols to access it at the consid-
ered location). It also defines mode managements for a client to characterize the
persistence of the data in the system, etc. All actions (initializing, transferring,
waiting for completion of asynchronous transfers, etc. ) are provided with only
12 functions.
This standard answers at the API level to issues related to feasibility of
the computation by decoupling the data from its locations and from protocols
 
Search WWH ::




Custom Search