Information Technology Reference
In-Depth Information
According to this configuration, six different datasets were generated. One of
each combination of task type (RandomR, SVM-RFE or GELF) and type of
infrastructure (homogeneous or heterogeneous)
5 Results and Analysis
In this section we present the results obtained during the experimental process.
Six different scenarios were analyzed: the three types of GEAE tasks on homo-
geneous and heterogeneous infrastructures .
For measuring the performance of each model we use the Relative Absolute
Error (RAE), which is computed as error = |p 1 −a 1 | + ... + |p m −a m |
|
100% , where,
p i and a i represent the predicted and actual values respectively for i th example.
a represents the mean value of the actual values and m is the number of testing
examples. This metric measures the deviation of predictions with respect to the
actual values. Following sections present the results obtained for the homoge-
neous and heterogeneous environments respectively, and an overall analysis of
results.
| ·
a 1
a
|
+ ... +
|
a m
a
Homogeneous Environment. Table 3 presents the errors for the homoge-
neous environments. Highlighted values represent the minimum errors for each
type of task. For the RandomR task, ANN achieves the minimum error (34.1%),
but all the methods present very similar performances (except for SVR whose
error ascends to 43.1%). It is worth to point out that regardless that high errors
are evidenced, in practice these errors do not imply very negative effects because
the mean duration of tasks is very small (7.7 s).
SVM-RFE is a much more simple task to model as can be evidenced by lower
errors on the table. Once again ANN achieves the best results. The impact of
these errors is depreciable because SVM-RFE tasks have an average duration of
16.3 s.
For GELF tasks, it can be seen that the Bagging strategy presents the mini-
mum error. This error is about 20.7%, which represents a reduction of the error
ranging from 10.5% to 21.2% in comparison with the rest of the competitors. In
contraposition to the ranker tasks, large errors on the prediction of GELF task's
duration have much more undesirable consequences because the duration of the
tasks are much larger (2183.6 s).
As a general note, it can be seen that the highest errors are obtained for
RandomR, because of two reasons. First, its performance is not determined by
any parameter or characteristic of the data (the task randomly sorts the genes
without any particular input than the data). Second, its short duration is very
likely to be disturbed by other factors (i.e. background load, workflow system
overhead, etc.).
Another result to note, the ensemble method evidenced errors in the same
range than the best of the methods (ANN) with only a 1%-2% increase of the
error. In addition, for the case of GELF, the performance of ANN drops dramat-
ically becoming the worst performing method. In contrast, the ensemble method
 
Search WWH ::




Custom Search