Mass Spectrometry Metabolomic Data Handling for Biomarker Discovery - Proteomic and Metabolomic Approaches to Biomarker Discovery

Biology Reference

In-Depth Information

input space into a higher dimensional feature

space by a mapping function,

function to implement nonlinear class bound-

aries with a great ability for generalization. 100,101

SVMs are generally robust to outliers and can

reliably handle noisy data. The selection of

a small number of critical observations, called

support vectors, is performed to compute

a hyperplane as a linear combination of these

objects,

. This transfor-

mation must be carefully achieved to offer a reli-

able linear model in the feature space that

corresponds to a nonlinear solution in the orig-

inal data space. Kernel functions are applied

to map the data in the feature space and the

kernel matrix summarizes similarity measure-

ments between pairs of observations. Because

the model construction is performed on the

kernel matrix instead of the initial data table,

it takes advantage of the kernel trick to reduce

the computational complexity. The most

typical kernels include polynomial and radial

basis functions. 92 Kernel extensions of well-

established bilinear factor models were imple-

mented,

fitting the data and maximizing

a geometrical margin value that separates classes

of observations. When no linear solution can be

found, a soft margin can be used. SVMs were

applied with success in metabolomics. 102,103

Other supervised methods such as instance-

based learning algorithms remain poorly imple-

mented in metabolomics, probably due to the

dif

cation models in

terms of variable contributions. Instead of infer-

ring general interpretable prediction rules, the

classi

culty of interpreting classi

PCA, 93

including

kernel

PLS, 94,95 and kernel O-PLS. 96

er is derived from the data set without

explicitly highlighting the discriminant patterns.

The k-Nearest Neighbor is probably the most

popular form of instance-basedmethods. 104 Prob-

abilistic algorithms rely on an explicit probability

model and Bayesian networks constitute well-

known methods of statistical learning. They

remain scarcely applied to exploratory analysis,

classi

Other Supervised Methods

In parallel to the previously mentioned stan-

dard approaches, other supervised algorithms

originating mainly from the machine learning

framework were implemented in metabolomics,

such as arti

cial neural networks (ANNs) and

support vector machines (SVMs).

ANNs are function approximation tools that

cover a broad variety of modeling strategies

from the single-layer linear perceptron, to

complex multilayer networks allowing

nonlinear data modeling. They rely on levels of

neurons d that is, layers of interconnected

units d to form a network with a hierarchical

structure. 97 The learning principle is the decom-

position of a complex problem into subproblems

that can be solved at the unit level. One of their

key advantages is their capacity to easily model

complex nonlinear systems, but despite high

prediction ability, the interpretability of ANNs

decreases as their complexity grows, as in the

case of a multilayer nonlinear neural network. 98

SVMs originate from the statistical learning

theory framework and were originally devel-

oped by Vapnik. 99 They build linear models in

a feature space by making use of a kernel

cation, and biomarker discovery 105 but

more often for the inference of metabolic

pathways. 106 Finally, only a few attempts were

made to compare the prediction ability of classi-

fiers relying on distinct learning principles with

respect to a given data set in parallel. 102,107

Conventional Statistical Analysis and

ROC Curves

Multivariate data analysis constitutes a crucial

step to provide a global picture of complex

networks of interrelated metabolites. However,

even if biological phenomena are intrinsically

multivariate, the evaluation of the individual

merit of a single biomarker may be desirable.

Classical univariate approaches

include the

Student

s t -test and the one-way analysis of

variance (ANOVA) or

their nonparametric

Proteomic and Metabolomic Approaches to Biomarker Discovery

Search WWH ::

Custom Search

Home