Adaptive Control of Redundant Task Execution for Dependable Volunteer Computing - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

•

The overall failure probability: FP i .

the task finish can reduce the overhead due to

redundant task dispatch.

When a worker j is found to be offline by the

periodical available status check, RR i of this task

is decreased by one. The worker ID is removed

from workerID i [ RR i ]. Finally, the overall failure

probability of the task is updated.

The overall failure probability is calculated as:

RR i



EFPs k

[

(2)

Similar to the original workflow management

mechanism, the enhanced workflow management

mechanism uses the status information to analyze

whether an ``undispatched'' task can be dispatched

or not. An ``undispatched'' task can only be dis-

patched when all the tasks that it depends on are

``finished.'' A workflow has two kinds of status:

``blocked'' and ``unblocked.'' While there is no

such ``undispatched'' task, the workflow manage-

ment mechanism uses the redundant task dispatch

to reduce the performance degradation. Such status

of a workflow is defined as ``blocked.''

The initial workflow information of each task

i is as follows:

The Task Selection and

Dispatch Policies

While the workflow management mechanism

controls the process of a job workflow, it requires

policies to select the tasks for dispatch, and find

the task-to-worker assignment when the task

enquiries come.

Task Selection Policy

In our previous work (Wang, 2007), the least-

RR-selected policy has been proposed to equally

reduce the failure rate of all the ``dispatched''

tasks. It selects a task with the least redundancy

rate and dispatches the task to an idle worker. As

the least-RR-selected policy assumes a constant

task failure rate, it cannot be applied directly to the

LFPD. In this paper, therefore, a highest-failure-

probability-selected policy is proposed to provide

the similar function for LFPD. It selects the task

with the highest overall failure probability.

Furthermore, the failure probabilities of a task

on different workers are different in a heteroge-

neous environment. By considering the task as-

signment of multiple tasks to multiple workers, a

lower overall failure probability can be achieved.

Thus, the idea of dispatch window is introduced.

The dispatch window is the number of tasks that

will be dispatched together. Given a window

size w , the dispatcher waits for task enquiries

from workers until the dispatch window is full,

then it selects w tasks with the highest-failure-

probability-selected policy.

•

Status: ``undispatched.''

•

Redundancy rate: RR i = 0 .

•

The overall failure probability: FP i = 1

which means that a task will never finish

before it is dispatched

When the dispatcher dispatches a task i to a

worker j , it provides the required input values from

the preceding tasks, and then changes the task i 's

status to ``dispatched'' if it was ``undispatched,''

and increases RR i by one. The worker j is stored

in the workerID i [ RR i ]. The failure probability of

this assignment {task i → worker j } is estimated

and stored in EFPs i [ RR i ]. The overall failure prob-

ability is calculated again.

When the dispatcher receives the result of a

task i from a worker j , it changes the status of task

i to ``finished'', and sends a ``cancel'' message

to the workers in workerID i [ RR i ], except worker

j . The function to cancel duplicate copies after

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home