Moldable Job Allocation for Handling Resource Fragmentation in Computational Grid - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

site run at the same speed and are linked with a

fast interconnection network that does not favor

any specific communication pattern (Feitelson

& Rudolph, 1995). This means a parallel job can

be allocated on any subset of nodes in a site. The

parallel computer system uses space-sharing and

run the jobs in an exclusive fashion.

The system deals with an on-line scheduling

problem without any knowledge of future job

submissions. The jobs under consideration are

restricted to batch jobs because this job type is

dominant on most parallel computer systems run-

ning scientific and engineering applications. For

the sake of simplicity, in this paper we assume

a global Grid scheduler which handles all job

scheduling and resource allocation activities. The

local schedulers are only responsible for starting

the jobs after their allocation by the global sched-

uler. Theoretically a single central scheduler could

be a critical limitation concerning efficiency and

reliability. However, practical distributed imple-

mentations are possible, in which site-autonomy is

still maintained but the resulting schedule would

be the same as created by a central scheduler (C.

Ernemann, Hamscher, & Yahyapour, 2004).

For simplification and efficient load sharing

all computing nodes in the computational Grid

are assumed to be binary compatible. The Grid is

heterogeneous in the sense that nodes on different

sites may differ in computing speed and differ-

ent sites may have different numbers of nodes.

When load sharing activities occur a job may

have to migrate to a remote site for execution.

In this case the input data for that job have to be

transferred to the target site before the job execu-

tion while the output data of the job is transferred

back afterwards. This network communication is

neglected in our simulation studies as this latency

can usually be hidden in pre- and post-fetching

phases without regards to the actual job execution

phase (C. Ernemann et al., 2004).

In this paper we focus on the area of high

throughput computing, improving system's overall

throughput with appropriate job scheduling and

allocation methods. Therefore, in our studies the

requested number of processors for each job is

bound by the total number of processors on the

local site from which the job is submitted. The

local site which a job is submitted from will be

called the home site of the job henceforward in

this paper. We assume all jobs have the moldable

property. It means the programs are written in a

way so that at runtime they can exploit different

parallelisms for execution according to specific

needs or available resource. Parallelism here

means the number of processors a job uses for

its execution. In our model we associated each

job with several attributes. The following five

attributes are provided before a simulation starts.

The first four attributes are directly gotten from

the SDSC SP2's workload log. The estimated

runtime attribute is generated by the simulation

program according to the specified range of esti-

mation errors and their corresponding statistical

distributions.

• Site number . This indicates the home site

of a job which it belongs to.

• Number of processors . It is the number of

processors a job uses according to the data

recorded in the workload log.

• Submission time . This provides the infor-

mation about when a job is submitted to its

home site.

• Runtime . It indicates the required execu-

tion time for a job using the specified num-

ber of processors on its home site. This

information for runtime is required for

driving the simulation to proceed.

• Estimated runtime. An estimated runtime

is provided upon job submission by the

user. The job scheduler uses this informa-

tion to guide the determination process of

job scheduling and allocation.

The following job attributes are collected and

calculated during the simulation for performance

evaluation.

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home