Coordination of Access to Large-Scale Datasets in Distributed Environments - Scientific Data Management

Database Reference

In-Depth Information

One of the earliest examples of dedicated data schedulers is the Stork data

scheduler. 53 Stork implements techniques specific to queuing, scheduling, and

optimization of data placement jobs and provides a level of abstraction be-

tween the user applications and the underlying data transfer and storage

resources. Stork introduced the concept that the data placement activities

in a distributed computing environment need to be first-class entities just

like computational jobs. Key features of Stork are presented in the next

section.

4.3 Scheduling Data Movement

Stork is especially designed to understand the semantics and characteristics

of data placement tasks, which can include data transfer, storage allocation

and deallocation, data removal, metadata registration and unregistration, and

replica location.

Stork uses the ClassAd 71 job description language to represent the data

placement jobs. The ClassAd language provides a very flexible and extensible

data model that can be used to represent arbitrary services and constraints.

This flexibility allows Stork to specify job-level policies as well as global ones.

Global policies apply to all jobs scheduled by the same Stork server. Users

can override them by specifying job-level policies in job description ClassAds.

Stork can interact with higher-level planners and workflow managers. This

allows the users to schedule both CPU resources and storage resources to-

gether. We have introduced a new workflow language capturing the data place-

ment jobs in the workflow as well. The enhancements made to the workflow

manager (i.e., DAGMan) allow it to differentiate between computational jobs

and data placement jobs. The workflow manager can then submit computa-

tional jobs to a computational job scheduler, such as Condor or Condor-G,

and the data placement jobs to Stork.

Stork also acts like an I/O control system (IOCS) between the user ap-

plications and the underlying protocols and data storage servers. It provides

complete modularity and extensibility. The users can add support for their fa-

vorite storage system, data transport protocol, or middleware very easily. This

is a crucial feature in a system designed to work in a heterogeneous distributed

environment. The users or applications should not expect all storage systems

to support the same interfaces to talk to each other. And we cannot expect

all applications to understand all the different storage systems, protocols, and

middleware. There needs to be a negotiating system between the applications

and the data storage systems that can interact with all such systems easily

and even translate different data transfer protocols to each other. Stork has

been developed to be capable of this. The modularity of Stork allows users to

easily insert plug-ins to support any storage system, protocol, or middleware.

Search WWH ::

Custom Search

Home