Temporal Verification for Scientific Cloud Workflows: State-of-the-Art and Research Challenges - Process-Aware Systems

Information Technology Reference

In-Depth Information

are intermediate temporal violations rather than final violations which are beyond

recovery. The work in [53] proposes three alternate courses of recovery action being

no action (NIL), rollback (RBK) and compensation (COM). NIL, which entirely

counts on the automatic recovery of the system itself, is normally not considered

'risk-free'. As for RBK, unlike handling conventional system function failures, it

normally causes extra delays and makes the current temporal violations even worse.

In contrast, COM, namely time deficit compensation, is a suitable approach for han-

dling temporal violations. The work in [5] proposes a time deficit allocation (TDA)

strategy which compensates current time deficits by utilizing the expected time re-

dundancy of subsequent activities. However, since the time deficit is not truly reduced

by TDA, this strategy can only postpone the violations of local constraints on some

local workflow segments, but has no effectiveness on the overall deadlines. There-

fore, we need to investigate those strategies which can indeed reduce the time deficits.

Besides many others, one of the compensation processes which is often employed and

can actually make up the time deficit is workflow rescheduling.

Workflow rescheduling, such as local rescheduling (which deals with the mapping

of underling resources to workflow activities within specific local workflow seg-

ments), is normally triggered by the violation of QoS constraints [13, 61]. Workflow

scheduling as well as workflow rescheduling are classical NP-complete problems [11,

16]. Therefore, many heuristic and metaheuristic algorithms are proposed. The work

in [60] has presented a systematic overview of workflow scheduling algorithms for

scientific grid computing. The work in [12] proposes an ACO (Ant Colony Optimiza-

tion) approach to address scientific workflow scheduling problems with various QoS

requirements such as reliability constraints, makespan constraints and cost constraints.

For handling temporal violations in scientific cloud workflow systems, both time and

cost need to be considered while time has a priority over cost since we focus more on

reducing the time deficits during the compensation process. An ACO based local

workflow rescheduling strategy is proposed by us in [30] for handling temporal viola-

tions in scientific workflows.

Another issue about violation handling is that some violations may be beyond the

handling power of certain violation handling strategies. Therefore, considering the

handling capability, statistically recoverable and non-recoverable temporal violations

are defined in [23] so that they can be handled properly with different violation han-

dling strategies. While statistically recoverable temporal violations can usually be

recovered by workflow rescheduling strategies, statistically non-recoverable temporal

violations can only be recovered by heavy-weight (i.e. more expensive) solutions such

as resource recruitment, stop and restart, processor swapping and workflow restruc-

ture [26]. The state-of-the-art work on a general temporal violation handling

framework is proposed in [31] where K levels of temporal violations can be defined

according to the handling capability of K available violation handling strategies. In

such a case, a temporal violation can be handled by the violation handling strategy

which has enough capability but with the least cost. Therefore, the total violation

handling cost can be minimized.

In addition, to reduce the increasing violation handling cost, the work in [28]

proposed a novel concept of “violation handling point selection” which is to further

Process-Aware Systems

Search WWH ::

Custom Search

Home