Standardized Multi-protocol Data Management for Grid and Cloud GridRPC Frameworks - Data Management in Cloud, Grid and P2P Systems

Database Reference

In-Depth Information

to access it; to performance using different sources and protocols to access a

remote data, providing explicit data management with the possibility to prefetch

and to migrate data, as well as the possibility to rely on some smart middleware

to transparently handle data management; and to extensibility by providing

containers of data. It also solves portability , making GridRPC codes portable

from one middleware to another.

2.2 Related Work

Similar works can address some data management issues in the GridRPC but

only separately and without integration into remote procedure call: one can store

data on a distributed file system like GlusterFS 1 or GFarm [9] to deal with auto-

matic replication; OmniRPC introduced omniStorage [7] as a Data Management

layer relying on several Data Managers such as GFarm and Bittorrent. It aims

to provide data sharing patterns (worker to worker, broadcast and all-exchange)

to optimize communications between a set of resources, but needs knowledge on

the topology and middleware deployment to be useful; Diet also introduced its

own data managers (DTM and Dagda [3,4]), which focus on both user explicit

data management and persistence of data across the resources, with transparent

migrations and replications.

At a higher level, Stork [8] is a batch scheduler specialized in data placement

and data movement. If the transfer protocol specified in the job description file

fails for some reason, Stork can automatically switch to any alternative protocol

available between the same source and the destination hosts and complete the

transfer; Galaxy 2 is a web interface written in python allowing on-line design

of task workflows. Galaxy focuses mainly on bioinformatics but could be used

for all type of applications relying on workflow execution. By default Galaxy

is configured to execute application on its host server but can use the OGF

DRMAA API to distribute computations on remote servers. Data can only be

transferred as files. On the contrary of classical RPC, there is no simple way to

upload data directly on the application memory address space. Moreover, the

GridRPC API modularity allows to combine simplicity of such data management

systems and tunability by choosing where and when data are transfered.

By using standardized GridRPC code with our implementation and its corre-

sponding modules, it should be possible to benefit at a upper layer from previ-

ous works, gaining in portability and interoperability with middleware and data

managers, which in turn provides access to a potentially larger set of resources

and architectures.

3 Implementation: Architecture and Features

We present in this section the system underlying our implementation of the

GridRPC and GridRPC Data Management standards. We highlight the features

1 http://www.gluster.org/

2 http://galaxyproject.org/

Search WWH ::

Custom Search

Home