Database Reference
In-Depth Information
1.6.6 s CheDuling
The effectiveness of a distributed program hinges on the manner in which its con-
stituent tasks are scheduled over distributed machines. Scheduling in distributed
programs is usually categorized into two main classes, task scheduling and job
scheduling . To start with, as defined in Section 1.2, a job can encompass one or
many tasks. Tasks are the finest unit of granularity for execution. Many jobs from
many users can be submitted simultaneously for execution on a cluster. Job schedul-
ers decide on which job should go next. For instance, Hadoop MapReduce adopts a
first in , first out (FIFO) job scheduler, whereby jobs are run according to the order of
which they have been received. With FIFO, running jobs cannot be preempted so as
to allow waiting jobs to proceed and, consequently, achieve certain objectives (e.g.,
avoiding job starvations and/or sharing resources effectively). As a result, simulta-
neous sharing of cluster resources is greatly limited. In particular, as long as a job
occupies the whole cluster resources, no other job is allowed to carry on. That is, the
next job in the FIFO work queue is not allowed to start up unless some resources
become free and the currently running job has no more tasks to execute. Clearly, this
might lead to a fairness issue wherein a very long job can block the whole cluster for
a very long time starving all small jobs. Hadoop MapReduce, however, employs mul-
tiple other job schedulers besides FIFO (e.g., Capacity [28] and Fair [29] schedulers).
After a job is granted the cluster, the decision morphs into how the job's tasks should
be scheduled. Tasks can be scheduled either close to the data that they are supposed
to process or anywhere . When tasks are scheduled nearby their data, locality is said
to be exploited. For example, Hadoop MapReduce schedules map tasks in the vicin-
ity of their uniform-sized input HDFS blocks and reduce tasks at any cluster nodes,
irrespective of the locations of their input data. As opposed to MapReduce, Pregel,
and GraphLab do not exploit any locality when scheduling tasks/vertices.
Task schedulers must account for the heterogeneity of the underlying cloud or,
otherwise, performance might degrade significantly. To elaborate, similar tasks that
belong to the same job can be scheduled at nodes of variant speeds in a heterogeneous
cloud. As reiterated in Sections 1.4.3 and 1.5.1, this can create load imbalance and
make jobs move at the pace of their slowest tasks. Strategies such as speculative exe-
cution in Hadoop MapReduce can (minimally) address such a challenge (see Section
1.5.5). In addition to considering heterogeneity, task schedulers must seek to enhance
system utilization and improve task parallelism. Specifically, tasks should be uni-
formly distributed across cluster machines in a way that fairly utilizes the available
cluster resources and effectively increases parallelism. Obviously, this presents some
contradictory objectives. To begin with, by evenly distributing tasks across cluster
machines, locality might be affected. For instance, machines in a Hadoop cluster can
contain different numbers of HDFS blocks. If at one machine, a larger number of
HDFS blocks exist as opposed to others, locality would entail scheduling all respec-
tive map tasks at that machine. This might make other machines less loaded and uti-
lized. In addition, this can reduce task parallelism as a consequence of accumulating
many tasks on the same machine. If locality is relaxed a little bit, however, utilization
can be enhanced, loads across machines can be balanced, and task parallelism can
be increased. Nonetheless, this would necessitate moving data toward tasks, which
Search WWH ::




Custom Search