Database Reference
In-Depth Information
contingencies such as performance degradation due to unplanned data trans-
fers and intermittent network outages.
Ecient and reliable access to large-scale data sources and archiving desti-
nations in a widely distributed computing environment brings new challenges:
Scheduling Data Movement. Traditional distributed computing systems
closely couple data handling and computation. They consider data re-
sources as second-class entities, and access to data as a side effect of
computation. Data placement (i.e., access, retrieval, and/or movement
of data) is either embedded in the computation and causes the com-
putation to delay, or performed as simple scripts that do not have the
privileges of a job. The insuciency of the traditional systems and exist-
ing CPU-oriented schedulers in dealing with the complex data-handling
problem has yielded a newly emerging era: the data-aware schedulers.
One of the first examples of such schedulers is the Stork data place-
ment scheduler that we have developed. Section 4.3 presents Stork and
data-aware scheduling.
Ecient Data Transfers. Another important challenge when dealing with
data transfers over wide area networks is eciently utilizing the available
network bandwidth. Data transfers over wide area, and in particular
over high-capacity network links, make the performance limitations of
the TCP protocol visible. In Section 4.4, we discuss these limitations
and various alternatives. Much of the work related to wide area data
transfer is focused on file or disk-to-disk transfer. We will also present
other types of scenarios, how they are different, what the challenges are,
and possible solutions.
Remote Access to Data. In some cases, transferring complete datasets
to remote destinations for computation may be very inecient. An al-
ternate solution is performing remote I/O, where the files of interest
stay in one place and the programs issue network operations to read
or write small amounts of data that are of immediate interest. In this
model, transfer protocols must be optimized for small operations, and
the processing site may need no storage at all. In Section 4.5, we discuss
advantages and challenges of remote I/O, and present the Parrot and
Chirp technologies as a case study.
Coscheduling of Resources. Distributed applications often require guar-
anteed levels of storage space at the destination sites, as well as guaran-
teed bandwidth between compute nodes, or between compute nodes and
a visualization resource. The development of a booking system for stor-
age or network resources is not a complete solution, as the user is still
left with the complexity of coordinating separate booking requests for
multiple computational resources with their storage and network book-
ing(s). We present a technique to coallocate computational and network
resources in Section 4.6. This coscheduling technique can easily be ex-
tended to storage resources as well.
Search WWH ::




Custom Search