Database Reference
In-Depth Information
grids and grid-enabled databases. Then, we discuss
QoS-oriented scheduling and placement strategies
for the Grid-based warehouse. In the following,
we present some experimental results. Next, we
draw conclusions. At the end of the chapter, we
present some key terms definitions.
acteristics when scheduling job execution may
become very time consuming. In the hierarchical
architecture, a Community Scheduler (or Resource
Broker ) is responsible to assign job execution to
sites. Each site has its own job scheduler ( Local
Scheduler ) which is responsible to locally schedule
the job execution. The Community Scheduler and
Local Schedulers may negotiate job execution
and each Local Scheduler may implement local
resource utilization policies. Besides that, as the
Community Scheduler does not have to exactly
know the workload and characteristics of each
available node, this model leads to greater scal-
ability than the centralized scheduling model.
In the Decentralized model, there is no Central
Scheduler. Each site has its own scheduler, which
is responsible to schedule local job execution.
Schedulers must interact to each other in order to
negotiate remote job execution. Several messages
may be necessary during the negotiation in order
to do good job scheduling, which may impact the
system's performance.
Some of the GRM systems have built-in
scheduling policies, but almost all enable the
user to implement its own scheduling policy or to
use application-level schedulers. In this context,
some general purpose application level schedul-
ers were designed [e.g. Condor-G (Frey et al,
2001) and Nimrod-G (Buyya et al, 2000)]. These
general purpose generally consider some kind
of user-specified requirement or QoS-parameter
(e.g. job's deadline), but may fail to efficiently
schedule data-bound jobs.
Query scheduling strategies for data-bound
jobs were evaluated by Ranganathan & Foster
(2004). Data Present (DP), Least Loaded Sched-
uling (LLS) and Random Scheduling (RS) were
compared. In RS, job execution is randomly
scheduled to available nodes. In LLS, each job
is scheduled to be executed by the node that has
the lowest number of waiting jobs. Both in RS
and LLS, a data-centric job may be scheduled
to be executed by a job that does not store the
required data to execute such job. In this case,
DATA GRIDS AND GRID-
ENABLED DATABASES
The Grid is an infra-structure that provides
transparent access to distributed heterogeneous
shared resources, which belong to distinct sites
(that may belong to distinct real organizations).
Each site has some degree of autonomy and may
impose resource usage restrictions for remote
users (Foster, 2001).
In the last decade, some Grid Resource Man-
agement (GRM) Systems [for example, Legion
(Grimshaw et al, 1997) and Globus Toolkit (Foster
& Kesselman, 1997)] were developed in order
to provide some basic functionality that is com-
monly necessary to run grid-based applications.
Authorization and remote job execution manage-
ment are among the most common features in
GRM systems. Some of them also provide data
management-related mechanisms, like efficient
data movement [e.g. GridFTP (Allcock et al,
2005)] and data replica location [e.g. Globus
Replica Location Service - RLS (Chervenak et
al, 2004)].
In terms of grid job scheduling, there are three
basic architectures (Krauter et al (2002): central-
ized , hierarchical and decentralized . In the first
one, a single Central Scheduler is used to schedule
the execution of all the incoming jobs, assigning
them directly to the existent resources. Such ar-
chitecture may lead to good scheduling decisions,
as the scheduler may consider the characteristics
and loads of all available resources, but suffers
from a scalability problem: if a wide variety of
distributed heterogeneous resources is available,
considering all the resources' individual char-
Search WWH ::




Custom Search