Temporal Verification for Scientific Cloud Workflows: State-of-the-Art and Research Challenges - Process-Aware Systems

Information Technology Reference

In-Depth Information

However, as analyzed in some literatures, there are many other time overheads in

scientific workflows [42] and they may become significant in the cloud computing

environment such as data transfer. In fact, as the datasets in scientific workflows are

usually very big, the data transfer time for large datasets across different data centers

through the Internet can be enormous. Therefore, we need to investigate these time

attributes and include them to the forecasting model.

There are a few papers investigating the modeling and estimation of cloud service

response time. Specifically, queueing model is employed as the basic forecasting tool

as it is powerful in modeling a multitenant cloud environment [19]. In addition, there

are some studies on the data transfer in a cloud environment and also some data

placement strategies to reduce the data transfer time for scientific cloud workflows

[62]. Nevertheless, the forecasting strategy for scientific cloud workflow activity

durations is still an open challenge and it is one of the fundamental issues for scientif-

ic cloud workflow temporal verification.

3.2

Open Challenge #2: The Monitoring of Many Parallel Computing Tasks

MapReduce kind of parallel data processing is pervasive in both scientific and busi-

ness workflow applications [14, 36]. In business workflow applications, the paral-

lelism is mainly at the workflow-level, namely a large number of business workflow

instances each representing a unique customer request. However, in scientific

workflow applications, the parallelism is mainly at the activity-level or even inside

each activity at the algorithm-level [3]. As current scientific workflow temporal

verification focuses on the critical path, such low-level parallelism can usually be

simplified as a compound workflow activity for the ease of monitoring and the loss of

accuracy is negligible [33]. However, as cloud computing is the becoming the major

platform for scientific computing, MapReduce kind of programming model will help

to exploit the benefits of massive parallelism to improve the efficiency of scientific

cloud workflows. Therefore, we can envisage the increasing popularity of parallel

data processing in scientific computing and hence the increase of parallel structures in

scientific cloud workflow applications. In such a case, we can no longer treat them as

sequential structures but to provide new monitoring strategies for many parallel com-

puting tasks.

Recently, there are a few studies on the monitoring of large number of parallel

business workflows which may provide a reference for the monitoring of many paral-

lel computing tasks in scientific workflows. The work in [29] proposes a novel idea

where system time points instead of traditional activity points are selected as temporal

checkpoints. Accordingly, the system throughput instead of traditional response time

is adopted as the measurement for system performance. However, as mentioned

above, business workflows and scientific workflows have different levels of paral-

lelism, and hence the strategy for monitoring parallel business workflow instances at

the workflow-level may not be suitable for monitoring parallel computing tasks at the

activity-level. One of the major differences is that there is significant data and time

dependency in scientific workflows at the activity-level but much less in business

Process-Aware Systems

Search WWH ::

Custom Search

Home