Information Technology Reference
In-Depth Information
achieves an average performance among the learning methods. But, a combined
model of M5P trees permits achieving much more accurate predictions. The en-
semble method lead to error reductions ranging between 3% and 13% considering
the six scenarios. The robustness of the ensemble method makes it very suitable
for real world settings where effort and intervention of experts in performance
modeling must be minimized.
Fig. 5. Relative Absolute Error for each model
6 Concluding Remarks
In this paper we proposed an adaptive scheme for the construction of perfor-
mance models for workflow-tasks run-time prediction suitable for Grids and
Clouds. This scheme uses Machine Learning methods to construct ensemble mod-
els using data-provenance information and other sources of data available from
the workflow systems.
Conducted experiments were designed for evaluating the performance of an
ensemble model in comparison with other single-model machine learning tech-
niques. Experiments focused on predicting the running time of tasks comprised
in real-world bioinformatic workflows. The performance of the studied strategies
was measured on both computing environments.
Results evidenced that the ensemble method outperforms its competitors ex-
cept for two of the six analyzed scenarios. For those two cases, the ensemble
method achieved a prediction errors only 1%-2% higher than the best strategy.
For the remaining 4 scenarios, the ensemble method outperformed their competi-
tors. The best results present wide margins of improvement with respect to their
competitors. In the best case the ensemble method presented error reductions
in the range of 10.5% to 21.2% on the homogeneous environment, and 8.0% to
24.9% for the heterogeneous case.
Undoubtedly, there is much more that can be investigated in relation with
complex models for predicting the performance of data-intensive scientific work-
flows. This paper is an initial step towards such objective. As future work we
plan to evaluate other ensemble learning strategies to gain more insights on the
importance of utilizing combined models for predicting the performance of appli-
cations. Also, studying new techniques for improving the quality of features may
help to increase the accuracy of the models [2]. Another idea to explore in this
direction is to study the applicability of these ensemble models into applications
for the processing of massive amounts of data, i.e. Big Data applications.
 
Search WWH ::




Custom Search