Information Technology Reference
In-Depth Information
site run at the same speed and are linked with a
fast interconnection network that does not favor
any specific communication pattern (Feitelson
& Rudolph, 1995). This means a parallel job can
be allocated on any subset of nodes in a site. The
parallel computer system uses space-sharing and
run the jobs in an exclusive fashion.
The system deals with an on-line scheduling
problem without any knowledge of future job
submissions. The jobs under consideration are
restricted to batch jobs because this job type is
dominant on most parallel computer systems run-
ning scientific and engineering applications. For
the sake of simplicity, in this paper we assume
a global Grid scheduler which handles all job
scheduling and resource allocation activities. The
local schedulers are only responsible for starting
the jobs after their allocation by the global sched-
uler. Theoretically a single central scheduler could
be a critical limitation concerning efficiency and
reliability. However, practical distributed imple-
mentations are possible, in which site-autonomy is
still maintained but the resulting schedule would
be the same as created by a central scheduler (C.
Ernemann, Hamscher, & Yahyapour, 2004).
For simplification and efficient load sharing
all computing nodes in the computational Grid
are assumed to be binary compatible. The Grid is
heterogeneous in the sense that nodes on different
sites may differ in computing speed and differ-
ent sites may have different numbers of nodes.
When load sharing activities occur a job may
have to migrate to a remote site for execution.
In this case the input data for that job have to be
transferred to the target site before the job execu-
tion while the output data of the job is transferred
back afterwards. This network communication is
neglected in our simulation studies as this latency
can usually be hidden in pre- and post-fetching
phases without regards to the actual job execution
phase (C. Ernemann et al., 2004).
In this paper we focus on the area of high
throughput computing, improving system's overall
throughput with appropriate job scheduling and
allocation methods. Therefore, in our studies the
requested number of processors for each job is
bound by the total number of processors on the
local site from which the job is submitted. The
local site which a job is submitted from will be
called the home site of the job henceforward in
this paper. We assume all jobs have the moldable
property. It means the programs are written in a
way so that at runtime they can exploit different
parallelisms for execution according to specific
needs or available resource. Parallelism here
means the number of processors a job uses for
its execution. In our model we associated each
job with several attributes. The following five
attributes are provided before a simulation starts.
The first four attributes are directly gotten from
the SDSC SP2's workload log. The estimated
runtime attribute is generated by the simulation
program according to the specified range of esti-
mation errors and their corresponding statistical
distributions.
Site number . This indicates the home site
of a job which it belongs to.
Number of processors . It is the number of
processors a job uses according to the data
recorded in the workload log.
Submission time . This provides the infor-
mation about when a job is submitted to its
home site.
Runtime . It indicates the required execu-
tion time for a job using the specified num-
ber of processors on its home site. This
information for runtime is required for
driving the simulation to proceed.
Estimated runtime. An estimated runtime
is provided upon job submission by the
user. The job scheduler uses this informa-
tion to guide the determination process of
job scheduling and allocation.
The following job attributes are collected and
calculated during the simulation for performance
evaluation.
Search WWH ::




Custom Search