Database Reference
In-Depth Information
to access it; to performance using different sources and protocols to access a
remote data, providing explicit data management with the possibility to prefetch
and to migrate data, as well as the possibility to rely on some smart middleware
to transparently handle data management; and to extensibility by providing
containers of data. It also solves portability , making GridRPC codes portable
from one middleware to another.
2.2 Related Work
Similar works can address some data management issues in the GridRPC but
only separately and without integration into remote procedure call: one can store
data on a distributed file system like GlusterFS 1 or GFarm [9] to deal with auto-
matic replication; OmniRPC introduced omniStorage [7] as a Data Management
layer relying on several Data Managers such as GFarm and Bittorrent. It aims
to provide data sharing patterns (worker to worker, broadcast and all-exchange)
to optimize communications between a set of resources, but needs knowledge on
the topology and middleware deployment to be useful; Diet also introduced its
own data managers (DTM and Dagda [3,4]), which focus on both user explicit
data management and persistence of data across the resources, with transparent
migrations and replications.
At a higher level, Stork [8] is a batch scheduler specialized in data placement
and data movement. If the transfer protocol specified in the job description file
fails for some reason, Stork can automatically switch to any alternative protocol
available between the same source and the destination hosts and complete the
transfer; Galaxy 2 is a web interface written in python allowing on-line design
of task workflows. Galaxy focuses mainly on bioinformatics but could be used
for all type of applications relying on workflow execution. By default Galaxy
is configured to execute application on its host server but can use the OGF
DRMAA API to distribute computations on remote servers. Data can only be
transferred as files. On the contrary of classical RPC, there is no simple way to
upload data directly on the application memory address space. Moreover, the
GridRPC API modularity allows to combine simplicity of such data management
systems and tunability by choosing where and when data are transfered.
By using standardized GridRPC code with our implementation and its corre-
sponding modules, it should be possible to benefit at a upper layer from previ-
ous works, gaining in portability and interoperability with middleware and data
managers, which in turn provides access to a potentially larger set of resources
and architectures.
3 Implementation: Architecture and Features
We present in this section the system underlying our implementation of the
GridRPC and GridRPC Data Management standards. We highlight the features
1 http://www.gluster.org/
2 http://galaxyproject.org/
Search WWH ::




Custom Search