Information Technology Reference
In-Depth Information
k-Nearest Neighbors (k-NN). In this strategy, training examples are stored ver-
batim. A distance function is used to determine which are the k examples of the
training set that are closest to an unknown test example [17]. The output of the
method is the average of the target values (i.e. running time) corresponding to
those k nearest examples. In this study we use the Euclidean distance. Given two
examples x (1) and x (2) with components x (1 1 ,x (1 2 ,...,x (1 n and x (2 1 ,x (2 2 ,...,x (2)
respectively, the Euclidean distance between them is: 1 ( x (1)
n
x (2)
i
) 2 .
i
Support Vector Regression (SVR). SVR [17] is an adaptation of the Support
Vector Machines (SVM) to deal with the prediction of numeric classes. Produced
models can be expressed in terms of a few support vectors that best describe
the prediction surface. The SVR model has the form f ( x )=
i∈SV
ʱ i K ( x ( i ) ,x )+
b, where SV is the set of support vectors and K ( x ( i ) ,x ) is a kernel function that
maps an example into a feature space of higher dimensions. ʱ i and b are model
parameters determined by solving the following optimization problem:
i =1 ( ʾ i + ʾ i ) subjectto :
l
1
2 ʱ T ʱ + C
y
f ( x i )
ʵ + ʾ i
min
ʱ,b,ʾ i i
,
ʵ + ʾ i
f ( x i )
y
ʾ i i > 0
where C is the model complexity parameter which penalizes the loss of train-
ing errors, ʾ i and ʾ i are slack variables that specify upper and lower bound
training errors subject to an error tolerance ʵ . To model non-linear functions
of the running time we use a radial basis function (RBF) kernel k ( x ( i ) ,x ( j ) )=
exp (
ʳ x ( i )
x ( j )
2
) with ʳ =0 . 01. The values parameters of SVR were set
to C =1and ʵ =0 . 001, which are the default values in Weka.
M5P Regression trees. The M5 Prime (M5P) algorithm [17] permits the induc-
tion of decision trees whose leaves are associated to regression models. M5P
trees are a combination of decision trees and linear models in the tree leaves.
The model is constructed in three phases as follows. First, a decision tree is
constructed minimizing the variability the output variable (i.e. examples with
very similar running time are grouped). Then, the tree is pruned starting from
the leaf nodes. If an inner node is pruned, it is converted into a leaf node and a
regression plane is associated. Finally, a smoothing function is applied to avoid
sharp discontinuities between the hyperplanes. An example of M5P regression
tree is given in Figure 3b.
Preprocessing. Before constructing the models, the available performance data
is normalized to avoid the dominance of some features of higher orders of magni-
tude in the construction of the models. To such end, each feature x i is transformed
intoanewfeature x i according to the equation x i = x i −μ
˃
,where μ and ˃ are the
 
Search WWH ::




Custom Search