Information Technology Reference
In-Depth Information
However, as analyzed in some literatures, there are many other time overheads in
scientific workflows [42] and they may become significant in the cloud computing
environment such as data transfer. In fact, as the datasets in scientific workflows are
usually very big, the data transfer time for large datasets across different data centers
through the Internet can be enormous. Therefore, we need to investigate these time
attributes and include them to the forecasting model.
There are a few papers investigating the modeling and estimation of cloud service
response time. Specifically, queueing model is employed as the basic forecasting tool
as it is powerful in modeling a multitenant cloud environment [19]. In addition, there
are some studies on the data transfer in a cloud environment and also some data
placement strategies to reduce the data transfer time for scientific cloud workflows
[62]. Nevertheless, the forecasting strategy for scientific cloud workflow activity
durations is still an open challenge and it is one of the fundamental issues for scientif-
ic cloud workflow temporal verification.
Open Challenge #2: The Monitoring of Many Parallel Computing Tasks
MapReduce kind of parallel data processing is pervasive in both scientific and busi-
ness workflow applications [14, 36]. In business workflow applications, the paral-
lelism is mainly at the workflow-level, namely a large number of business workflow
instances each representing a unique customer request. However, in scientific
workflow applications, the parallelism is mainly at the activity-level or even inside
each activity at the algorithm-level [3]. As current scientific workflow temporal
verification focuses on the critical path, such low-level parallelism can usually be
simplified as a compound workflow activity for the ease of monitoring and the loss of
accuracy is negligible [33]. However, as cloud computing is the becoming the major
platform for scientific computing, MapReduce kind of programming model will help
to exploit the benefits of massive parallelism to improve the efficiency of scientific
cloud workflows. Therefore, we can envisage the increasing popularity of parallel
data processing in scientific computing and hence the increase of parallel structures in
scientific cloud workflow applications. In such a case, we can no longer treat them as
sequential structures but to provide new monitoring strategies for many parallel com-
puting tasks.
Recently, there are a few studies on the monitoring of large number of parallel
business workflows which may provide a reference for the monitoring of many paral-
lel computing tasks in scientific workflows. The work in [29] proposes a novel idea
where system time points instead of traditional activity points are selected as temporal
checkpoints. Accordingly, the system throughput instead of traditional response time
is adopted as the measurement for system performance. However, as mentioned
above, business workflows and scientific workflows have different levels of paral-
lelism, and hence the strategy for monitoring parallel business workflow instances at
the workflow-level may not be suitable for monitoring parallel computing tasks at the
activity-level. One of the major differences is that there is significant data and time
dependency in scientific workflows at the activity-level but much less in business
Search WWH ::

Custom Search