Information Technology Reference
In-Depth Information
algorithm (GELF) and the state-of-the-art approach (baseline) [7] on their re-
spective ability to classify unseen data.
The experiment comprises the execution of several workflows. Each of them
processes one of the 20 microarray datasets used for the experiment using a 10-
fold cross-validation scheme. Figure 4a represents one GEAE example workflow.
Sub-experiments, cover a set of several parameters in order to consider various
aspects of methods intended to compare. As can be seen on Figure 4b each
sub-experiment involves 3 types of tasks:
There are two
ranker tasks
that perform a selection of genes in order to re-
duce the number of features for learning the classifier. The first one uses recursive
feature elimination using support vector machines (called
SVM-RFE
), and the
second one returns a random order of features (
RandomR
). The third task
consists on consists on
learning and evaluating
the performance of the (
GELF
)
classifier. GELF is a feature construction algorithm based on iterative improve-
ment of the best solution obtained by the state-of-the-art approach [7].
(a)
Scheme of one of the GEAE workflows.
(b)
Abstract and concrete views of a ML
sub-experiment.
Fig. 4.
GEAE workflows. Overview of one of the GEAE workflows (a) and decompo-
sition of for one sub-experiment (b).
Each workflow comprises 20 sub-experiments: both combinations of the GELF
task with the rankers (Random and SVM-RFE) applied on the 10 dataset folds.
As can be seen each workflow application consists of 40 tasks (i.e. 10 Random
ranker executions, 10 SVM-RFE ranker executions and 20 GELF executions).
Considering that we executed the workflows over 20 different datasets. This
gives 800 task executions. To generate a wide spectrum of performance-data
examples, each workflow was executed 10 times on resources of different type.
Table 1 describes the characteristic of the resources used for executing the GEAE
workflows. J
avaMFlops,
KFlops and MIPS are performance values provided by
the SciMark2
1
, Linpack
2
and Dhrystone [16] benchmarks respectively.