Database Reference
In-Depth Information
One of the earliest examples of dedicated data schedulers is the Stork data
scheduler. 53 Stork implements techniques specific to queuing, scheduling, and
optimization of data placement jobs and provides a level of abstraction be-
tween the user applications and the underlying data transfer and storage
resources. Stork introduced the concept that the data placement activities
in a distributed computing environment need to be first-class entities just
like computational jobs. Key features of Stork are presented in the next
section.
4.3 Scheduling Data Movement
Stork is especially designed to understand the semantics and characteristics
of data placement tasks, which can include data transfer, storage allocation
and deallocation, data removal, metadata registration and unregistration, and
replica location.
Stork uses the ClassAd 71 job description language to represent the data
placement jobs. The ClassAd language provides a very flexible and extensible
data model that can be used to represent arbitrary services and constraints.
This flexibility allows Stork to specify job-level policies as well as global ones.
Global policies apply to all jobs scheduled by the same Stork server. Users
can override them by specifying job-level policies in job description ClassAds.
Stork can interact with higher-level planners and workflow managers. This
allows the users to schedule both CPU resources and storage resources to-
gether. We have introduced a new workflow language capturing the data place-
ment jobs in the workflow as well. The enhancements made to the workflow
manager (i.e., DAGMan) allow it to differentiate between computational jobs
and data placement jobs. The workflow manager can then submit computa-
tional jobs to a computational job scheduler, such as Condor or Condor-G,
and the data placement jobs to Stork.
Stork also acts like an I/O control system (IOCS) between the user ap-
plications and the underlying protocols and data storage servers. It provides
complete modularity and extensibility. The users can add support for their fa-
vorite storage system, data transport protocol, or middleware very easily. This
is a crucial feature in a system designed to work in a heterogeneous distributed
environment. The users or applications should not expect all storage systems
to support the same interfaces to talk to each other. And we cannot expect
all applications to understand all the different storage systems, protocols, and
middleware. There needs to be a negotiating system between the applications
and the data storage systems that can interact with all such systems easily
and even translate different data transfer protocols to each other. Stork has
been developed to be capable of this. The modularity of Stork allows users to
easily insert plug-ins to support any storage system, protocol, or middleware.
Search WWH ::




Custom Search