Adaptive Control of Redundant Task Execution for Dependable Volunteer Computing - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

greedy dispatch policy is not affected. Therefore,

the normalized process time with the greedy dis-

patch policy decreases while increasing the mean

task process time.

These two figures also show that the LFPD

policy is more efficient than the greedy dispatch

policy for a small mean task process time. It is

because of the ``sleep'' message used in the greedy

dispatch policy. The greedy dispatch policy lets a

worker sleep if it finds this dispatch to be a waste

of computing power. It happens when the existing

copy of a task has a smaller EFT than this new

copy. This mechanism boosts the performance in

most cases. However, it also introduces a possible

overhead, for letting workers sleep even when

the workflow is no longer ``blocked'' and ``un-

dispatched'' tasks are available. When the mean

task process time is small, the failures occur less

frequently. Therefore, the greedy dispatch policy's

advantage for eliminating task failures becomes

less significant. In such cases, this particular

overhead becomes more obvious and leads to

worse efficiency.

is small. It explains why the process time with the

LFPD increases when the window size exceeds a

certain value in Figures 4(b) and 4(c). With a much

larger number of online workers, the overhead

for the dispatch window is not significant. Thus,

the results with the Microsoft PCs trace are not

affected by a small number of task groups and a

large window size.

Improvement of the Performance

by Identifying Worker Types

In the real world, multiple types of workers

exist. A different type of workers has different

availability characteristics. Nadeem et al.(2008)

introduced the class level modeling method by

pre-identifying three types of resources in the

Austrian Grid (http://www.austriangrid.at). The

TTF distribution of different types of resources

is largely different across the three types. The

heuristics-based failure estimation relies on the

empirical distribution, and assumes that all the

workers have similar availability behavior. Gather-

ing multiple types of workers' TTF into a single

TTF distribution leads to a low estimation accu-

racy. The low estimation accuracy will degrade

the performance, because the LFPD policy cannot

find the optimal task-to-worker assignments with

the inaccurate failure estimations.

To improve the failure estimation accuracy, the

worker type is considered. Two types of workers

are selected from the two real world trace data

sets. First, the two trace data sets are clustered

into several types, using a K-Means clustering

algorithm in the Weka toolkit (Witten, 2005). By

extracting the TTF and the down time pair from the

original trace data sets, two dimensional data are

generated. The number of clusters is four, based on

the assumption that four kinds of workers (diur-

nal, weekly, long TTF , and long downtime) exist.

The clustering results are shown in Table 3. Each

cluster shows different characteristics. Cluster 3

of Microsoft PCs trace shows a diurnal pattern,

while Cluster 3 of Skype trace is highly volatile.

Effects of Window Size

on the Process Time

As shown in both Figures 4 and 5, a larger window

size results in a shorter process time for the LFPD

policy in most cases. This is because the LFPD

policy is likely to find a better task-to-worker as-

signment with a larger window size, especially for

the smaller mean task process time. As discussed

earlier, a smaller mean task process time results

in less frequent failures. Thus, the LFPD policy

has a higher probability to find an assignment

with less failures.

The overhead introduced by the ``blocked''

status is not serious when the number of task

groups is small. Thus, the improvement achieved

with the LFPD policy is small. Therefore, while

the window size increases, the overhead for the

dispatch window becomes obvious. The overhead

is more serious when the number of online workers

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home