Database Reference
In-Depth Information
In a traditional batch scheduler system, the user could intervene and man-
ually set some virtual limits to the level of concurrency the scheduler can
achieve during these transfers. For example, a safe concurrency limit would
be the total amount of storage space available at the staging area divided by
the size of the largest file that is in the request queue. This would assure the
scheduler does not overcommit remote storage. Any concurrency level higher
than this would have the risk of getting out of disk space anytime, and may
cause failure of at least some of the jobs. The performance of the traditional
scheduler with concurrency level set to 10 manually by the user in the same
experiment is shown in Figure 4.2 on the right side.
Manually setting the concurrency level in a traditional batch scheduling
system has three main disadvantages. First, it is not automatic; it requires
user intervention and depends on the decision made by the user. Second, the
set concurrency is constant and does not fully utilize the available storage
unless the sizes of all the files in the request queue are equal. Finally, if the
available storage increases or decreases during the transfers, the traditional
scheduler cannot readjust the concurrency level in order to prevent overcom-
mitment of the decreased storage space or fully utilize the increased storage
space.
Storage Server Connection Management. Another important resource
that needs to be managed carefully is the number of concurrent connections
made to specific storage servers. Storage servers being thrashed or getting
crashed due to too many concurrent file transfer connections has been a com-
mon problem in data-intensive distributed computing.
In our framework, the data storage resources are considered first-class citi-
zens just like the computational resources. Similar to computational resources
advertising themselves, their attributes, and their access policies, the data
storage resources advertise themselves, their attributes, and their access poli-
cies as well. The advertisement sent by the storage resource includes the num-
ber of maximum concurrent connections it wants to take anytime. It can also
include a detailed breakdown of how many connections will be accepted from
which client, such as “maximum n GridFTP connections.” and “maximum m
HTTP connections.”
This throttling is in addition to the global throttling performed by the
scheduler. The scheduler will not execute more than, let us say, x amount of
data placement requests at any time, but it will also not send more than y
requests to server a , and more than z requests to server b , y+z being less
than or equal to x.
Other Scheduling Optimizations. In some cases, two different jobs re-
quest the transfer of the same file to the same destination. Obviously, all
of these requests except one are redundant and wasting computational and
network resources. The data placement scheduler catches such requests in its
queue, performs only one of them, but returns success (or failure, depending
on the return code) to all of such requests. We want to highlight that the
redundant jobs are not canceled or simply removed from the queue. They still
Search WWH ::




Custom Search