Adaptive Control of Redundant Task Execution for Dependable Volunteer Computing - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

Figure 1. Life cycle of a volunteer peer

A HEURISTICS-BASED FAILURE

PROBABILITY ESTIMATION

each class's availability ( TTF ) and unavailability

( Mean Time to Reboot (MTR) ) data. While other

works found one or two best fitted distributions,

this work found different best fitted distributions

for different class.

The prediction methods of resource available

status reviewed in Section 2 provide a different

accuracy for their selected environments. Since

this paper targets at finding optimized task as-

signment with estimated task failure probabilities,

the distribution of empirical availability data can

provide enough information. Here, a simple and

straight heuristics-based failure probability esti-

mation method is employed.

Availability Prediction

Brevik et al. (2004) assumed a homogeneous en-

vironment, and proposed an availability prediction

method on top of the found Weibull distribution.

This method answered the question what is the

largest availability duration for a given confidence

value and a desired percentile. Iosup et al. (2007)

proposed a resource availability model that con-

sidered the failure distribution among clusters, the

TTF distribution, failure duration distribution, and

the distribution of the failure size, which is the

number of failed processors. This model is used to

predict the failures in a multi-cluster grid system.

Some other works (Ren, 2006; Rood, 2007)

utilized the availability pattern on weekdays and

weekends to predict the availability. Nadeem et

al. (2008) used Bayes Rule and Nearest Neighbor

Rule to predict the resource availability. Mickens et

al.(2006) proposed saturating counter predictors,

state-based history predictors, a linear predictor,

and a hybrid predictor that dynamically selects

the best predictor. These predictors have been

evaluated with trace data sets of distributed serv-

ers, peer-to-peer network, and corporation PCs.

Life Cycle of a Volunteer Peer

The life cycle of a volunteer peer can be modeled

as shown in Figure 1. TTF is the time between a

peer's start/restart and the next failure/shutdown.

DT is the time between a failure and the next peer

restart. Given a statistical distribution of TTF ,

the cumulative distribution function (CDF) of

this distribution's value at each uptime x is the

probability that a peer's TTF is smaller than or

equal to x , which equals to the failure probability

at uptime x . The failure probability monotonously

increases with time. Since none of a single distri-

bution can characterize the resource availability

accurately for any systems in large scale comput-

ing environments (Nurmi, 2005; Nadeem, 2008),

a heuristics-based mechanism is proposed to

estimate the failure probability at runtime with

gathered TTF data.

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home