Information Technology Reference
In-Depth Information
For managing the applications eciently, WMSs rely on run-time estimates
of tasks. This information is the basis for several processes like for example:
tasks scheduling, fulfillment of Quality of Service (QoS) requirements, autoscal-
ing cloud infrastructures among others [3,5,9].
Most of the prediction methods used by WMSs were crafted for characteriz-
ing parallel applications. Although such techniques provide accurate predictions,
they require the supervision of an expert for constructing and tuning the pre-
diction models. Such requirements lure one of the main advantages of workflow
technology: simplicity for the user .
To cope with such limitation many authors applied Machine Learning strate-
gies to generate the prediction models (semi-)automatically. Following this line
of thought, we propose a novel method for the autonomous generation of mul-
tiple combined run-time prediction models derived using Ensemble Learning
methods. The final objective of our approach is the minimization of the human
effort when generating the models without handicapping the accuracy of pre-
dictions. For accomplishing such objective this work utilizes the performance
information available in WMSs and workflow provenance information to learn
robust combined models.
The rest of this paper is organized as follows. In section 2 we provide a review
of performance prediction strategies based on Machine Learning methods. Sec-
tion 3 presents the proposed approach for learning run-time prediction models.
Section 4 describes a set of Bioinformatic workflows and the methodology used
for validating our proposal. Section 5 presents and discusses the results obtained.
Finally, conclusions and future work are given in section 6.
2 Related Works
The prediction of application's performance has been studied since the genesis
of parallel and distributed computing [1]. Many of such strategies use historical
data to carry out the predictions instead of constructing the models by hand.
Statistical and Machine Learning techniques permit the derivation of models
based on the available historical data (examples). This approach supposes an
important advantage for workflow applications executing on Grid or Cloud envi-
ronments because models can be refined over time and the user does not need to
be supervising the construction of the models or performing tedious tasks such
as benchmarking resources, profiling applications, etc.
Some of these strategies address the prediction issue using the k-Nearest
Neighbors strategy [8,11]. Predictions are performed by first looking execution
examples with similar settings to the prediction query (e.g. examples with similar
task parameters, processor speed, etc.). Then, the execution times correspond-
ing to the selected examples are averaged and returned as the prediction. Other
authors use methods such as regression trees for predicting the performance of
applications [12]. More recently, Artificial Neural Networks have been applied to
estimate the price of market-based computing resources [15].
Mentioned strategies apply statistical or machine-learning methods to pre-
dict several aspects of the execution of applications in the context of distributed
 
Search WWH ::




Custom Search