Information Technology Reference
In-Depth Information
Stage 4: Tasks Run-time Prediction. Consists in the generation of running time
estimates for tasks using the models constructed on the previous stage. Run-time
estimates are obtained considering the inputs of workflow tasks (i.e. parameters
and data) and the characteristics of the resources which will eventually execute
such tasks.
This sequence of stages is repeated continuously throughout the execution
of several applications. Each one of these cycles permits the improvement of
the predictive accuracy of the models. This strategy allows the adaptation of
the models to new (unseen) execution examples. The important aspect to note
is that this adaptive learning process improves the accuracy of the prediction
models without requiring human intervention more than the initial setup of
the performance data to collect. Ensemble learning plays a central role in such
objective because enables the strategy with very robust models autonomously.
3.2 Performance-Data Representation
Performance data is stored separately for each type of task. The performance
dataset for a task can be formally defined as a set
i = m ,where x ( i )
represents a column vector of features for the i th (out of m ) recorded execution
example of a task, and y ( i ) is the measured running time for such execution, also
known as target .
Each feature vector x =[ x 1 ,x 2 ,
x ( i ) ,y ( i )
D
=
{
}
,x n ] comprises three types of elements:
( i ) task features , which represent the inputs of the task, e.g. parameter val-
ues, data size, etc.; ( ii ) provenance features , describe previous processes that
generated or modified the input data; and ( iii ) resource features , which model
characteristics of the resource used on the execution of the task.
ยทยทยท
Task features. This kind of features describe the task's inputs. This information
includes the values taken by input parameters and characteristics of the data
such as size, number of lines, registers or columns, etc.
Provenance features. This type of features capture information of the data origin
and the transformations produced by other tasks during the execution of the
workflow. Such information can be easily extracted from the description of the
workflow. The incorporation of such information permits the obtainment of more
accurate performance models. As said before, to the extent of our knowledge,
there is no other strategy in the state of the art using such information for
producing run-time predictions.
Resource features. This kind of features describe the computing resources used
in the tasks execution. These features can be obtained from the WMS. Features
used for modeling the performance of an application are those which measure the
performance of the resources (i.e. those that impact directly on the performance
of tasks). Such information is mainly provided by resource benchmarks. In gen-
eral, most part of the WMSs provide such metrics and update them regularly.
Note that in the case of web services, this type of features will be inaccessible.
Search WWH ::




Custom Search