Adaptive Control of Redundant Task Execution for Dependable Volunteer Computing - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

Failure Probability Estimation

the last online session, the TTF is 680 minutes

(from 2010/11/23 3:10 to 2010/11/23 14:30).

With this simple periodical availability status

checking mechanism, the runtime TTF data are

gathered on the dispatcher. Thus, the TTF distri-

bution can be found at runtime. Suppose the

gathered TTF s are { ttf 1 , ttf 2 , ttf 3, ..., ttf n }, where n

is the number of gathered TTF is The failure prob-

ability F(x) of a worker ( x is the time after a

worker went online) can be estimated as shown

in Equation (1):

Volunteer computing platforms have two kinds of

peers: dispatchers and workers. A task dispatcher

is a specific server that controls a volunteer com-

puting platform. Workers are volatile peers that

compute tasks and send back the task results to

the dispatcher. To estimate the failure probability

of each worker, runtime TTF data are required.

To gather such runtime data, a worker availability

status list is maintained by the dispatcher. The list

stores the start time of each worker. If a worker

is currently unavailable, it is marked as offline in

the list. The list is maintained as follows:

F x

( )

(1)

A Worker Goes Online

where n x is the number of TTF is that are less than

or equal to x .

As shown in Figure 2(a), when a worker goes on-

line, it sends an online notification message to the

dispatcher. Once the notification is received, the

dispatcher updates the worker availability status

list as shown in Figure 2(b). The current time is

stored as the start time of this worker.

LEAST FAILURE PROBABILITY

DISPATCH POLICY

With the failure probability estimation, this paper

proposes a performance-oriented task dispatch

policy - Least Failure Probability Dispatch

(LFPD) for volunteer computing platforms. The

assumptions are slightly different from the ones in

our previous work (Wang, 2007). While the previ-

ous work assumes a homogeneous environment,

this paper assumes that the volunteer computing

platform is a heterogeneous environment, in which

all the workers have different performances and

different bandwidths to the dispatcher.

Find Offline Worker

To gather the runtime TTF data, the dispatcher

also checks the availability status of workers pe-

riodically. As shown in Figure 3(a), the dispatcher

sends status checking messages to the workers

that are marked online in the worker availability

status list. Once the message is received by an

alive worker, the worker sends a reply message

back to the dispatcher as shown in Figure 3(b). If

a worker is offline, it cannot reply the checking

message. Then, it is marked as offline in the list.

As an example, before the periodical status check,

worker 4 in Figure 3 had been marked as online

with a start time in the list, and then went offline.

Thus, it does not reply the checking message. The

dispatcher then updates the worker availability

status list, and marks worker 4 to be offline . It also

calculates the TTF of the worker 4's last online

session. Given the current time and start time of

An Enhanced Workflow

Management Mechanism

A workflow management mechanism has been

proposed in our previous work (Wang, 2007). It

is responsible for directing the workflow control

and the task information update. It cannot fully

satisfy the requirement of the LFPD, because it

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home