Speculative Scheduling of Parameter Sweep Applications Using Job Behaviour Descriptions - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

Algorithm 2.

void schedule(Job j, JobDescription d) {

Map<CE, Long> m = new HashMap<CE, Long>();

for (ClusterProfile c: g) {

if (c.canRun(descr)) {

m.put(c, estimate(c, d)); // calculate the est. finish time

}

CE c = getOptimalCE(m); // get the optimal CE

executeJob(j, c); // run job j on c

}

3. The scheduler applies the proposed schedul-

ing algorithm, which - using the behaviour

description and the information available on

the current state of the Grid - calculates the

estimated job finishing time for each Grid

component, and schedules the job to the

component where the job would be finished

the earliest.

4. The job is executed on a computer belonging

to the chosen Grid component. The resource

consumption of the job is monitored, and

after the job is terminated, the collected

information is used by the description re-

pository service to update the description

repository with a refined description.

5. The output of the job (and the behaviour

description of the job) is copied to the speci-

fied target node.

Manager. The simplified code-snippet in Algo-

rithm 2 presents the static data feeder algorithm.

The estimated execution time of a job described

by d on cluster c is calculated as follows.

m

∑ l

1

C c d

( , )

=

( , ) * ( ,

d i C c d i

)

i

=

The actual state of cluster c is obtained from

the GIS. The estimate ( c , d ) estimated termination

time of the given job on cluster c is the sum of the

estimated job execution time C ( c , d ), the ”length”

(measured by) of the job queue on that cluster

( Q ( c )), and the time necessary for preparing the

input files (before running the job) and delivering

result/output files (after the job is terminated):

estimate ( c , d ) = C ( c , d ) + Q ( c ) +

fileTransferTime ( c , d )

Static Data Feeder Strategy

Please note that before running the job on the

chosen cluster the necessary files are replicated

by the Replica Manager (The DataGrid Project).

The static data feeder strategy ranks each Comput-

ing Element (CE) by estimating the termination

time of the submitted job on the given component.

After the ranking of CEs the scheduler runs the job

on the CE with the highest rank, i.e. the earliest

completion time. The estimated job completion

time depends on the job description and on the

information collected from the GIS and the Replica

Dynamic Data Feeder Strategy

The basic idea behind the dynamic data feeder

strategy is to download relevant parts of the input

files (those parts that the job will presumably

access) and to upload the output of the job to the

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home