Biology Reference
In-Depth Information
input space into a higher dimensional feature
space by a mapping function,
function to implement nonlinear class bound-
aries with a great ability for generalization. 100,101
SVMs are generally robust to outliers and can
reliably handle noisy data. The selection of
a small number of critical observations, called
support vectors, is performed to compute
a hyperplane as a linear combination of these
objects,
. This transfor-
mation must be carefully achieved to offer a reli-
able linear model in the feature space that
corresponds to a nonlinear solution in the orig-
inal data space. Kernel functions are applied
to map the data in the feature space and the
kernel matrix summarizes similarity measure-
ments between pairs of observations. Because
the model construction is performed on the
kernel matrix instead of the initial data table,
it takes advantage of the kernel trick to reduce
the computational complexity. The most
typical kernels include polynomial and radial
basis functions. 92 Kernel extensions of well-
established bilinear factor models were imple-
mented,
4
fitting the data and maximizing
a geometrical margin value that separates classes
of observations. When no linear solution can be
found, a soft margin can be used. SVMs were
applied with success in metabolomics. 102,103
Other supervised methods such as instance-
based learning algorithms remain poorly imple-
mented in metabolomics, probably due to the
dif
cation models in
terms of variable contributions. Instead of infer-
ring general interpretable prediction rules, the
classi
culty of interpreting classi
PCA, 93
including
kernel
kernel
PLS, 94,95 and kernel O-PLS. 96
er is derived from the data set without
explicitly highlighting the discriminant patterns.
The k-Nearest Neighbor is probably the most
popular form of instance-basedmethods. 104 Prob-
abilistic algorithms rely on an explicit probability
model and Bayesian networks constitute well-
known methods of statistical learning. They
remain scarcely applied to exploratory analysis,
classi
Other Supervised Methods
In parallel to the previously mentioned stan-
dard approaches, other supervised algorithms
originating mainly from the machine learning
framework were implemented in metabolomics,
such as arti
cial neural networks (ANNs) and
support vector machines (SVMs).
ANNs are function approximation tools that
cover a broad variety of modeling strategies
from the single-layer linear perceptron, to
complex multilayer networks allowing
nonlinear data modeling. They rely on levels of
neurons d that is, layers of interconnected
units d to form a network with a hierarchical
structure. 97 The learning principle is the decom-
position of a complex problem into subproblems
that can be solved at the unit level. One of their
key advantages is their capacity to easily model
complex nonlinear systems, but despite high
prediction ability, the interpretability of ANNs
decreases as their complexity grows, as in the
case of a multilayer nonlinear neural network. 98
SVMs originate from the statistical learning
theory framework and were originally devel-
oped by Vapnik. 99 They build linear models in
a feature space by making use of a kernel
cation, and biomarker discovery 105 but
more often for the inference of metabolic
pathways. 106 Finally, only a few attempts were
made to compare the prediction ability of classi-
fiers relying on distinct learning principles with
respect to a given data set in parallel. 102,107
Conventional Statistical Analysis and
ROC Curves
Multivariate data analysis constitutes a crucial
step to provide a global picture of complex
networks of interrelated metabolites. However,
even if biological phenomena are intrinsically
multivariate, the evaluation of the individual
merit of a single biomarker may be desirable.
Classical univariate approaches
include the
Student
s t -test and the one-way analysis of
variance (ANOVA) or
'
their nonparametric
Search WWH ::




Custom Search