Biology Reference
In-Depth Information
handled cautiously in the context of metabolo-
mics. As conservation relationships occur in
networks of interrelated compounds, 66 redun-
dant signals may be informative to evaluate
metabolic pathways comprehensively.
a large series of columns, which correspond to
the variables. Due to the high dimensionality of
metabolomic data, standard statistical analysis
tools are not appropriate or reliable, and robust
approaches are mandatory. 71,72 The choice of
the data analysis strategy has a strong in
uence
on the outputs obtained. Although a holistic
picture of all biochemical protagonists is suited
to study a biochemical pathway, a restricted
number of highly reliable biomarkers may be
desired for clinical diagnosis. In numerous
metabolomic studies, the worth of a multivariate
model resides more in its absolute biological
relevance than in its relative predictive perfor-
mance. The examination of the model is expected
to provide explicit knowledge about the pheno-
menon under
Integration of Biological Knowledge
An increasing trend in metabolomic studies
involves the integration of biological knowledge
to obtain more reliable results. Metabolic path-
ways and class of compounds extracted fromdata-
bases constitute information that can be used
a priori to assess altered patterns with respect to
groups of related metabolites. A simple manner
to perform such a metabolite set enrichment
analysis is to search for metabolites belonging to
a given pathway among variables that are
differentially expressed between case and
control conditions. 68 However, such an approach
requires the proper identi
study. Highlighting common
trends or speci
c patterns, as well as relevant
biomarkers, constitutes common objectives in
many applications. A work
ow incorporating
variable selection, data modeling, and model
validation is proposed in Figure 4 .
cation of all or at least
asuf
cient number of associated metabolites.
This constitutes a major bottleneck for the study
of metabolomic data compared to genomic enrich-
ment analyses. Alternatively, an untargeted selec-
tion of molecules or a class of compounds from the
raw data can be applied based on a priori informa-
tion with the comparison of a list of reference m/z
related to molecules of interest with accurate MS
data. Models can then be built based on a reduced
set of variables in relation with particular proper-
ties, such as m/z or structural characteristics,
instead of the whole data set. 69 Recent advances
in datamining strategies include the incorporation
of background knowledge, such as dependencies
between variables, to provide global tests able to
assess subsets of metabolites. 70
Exploratory Analysis with Unsupervised
Methods
Typical data mining work
ow begins with
unsupervised analysis to provide a
first overview
of the data set. The data set is considered as
a collection of similar objects, without prior infor-
mation about sample groupings or measured
outcome. It is useful to assess the samples
'
distri-
bution and detect potential outliers.
Unsupervised statistical tools aim at building
models summarizing the data table in an intelli-
gible way and hopefully
finding natural
partitions of the data set to facilitate the under-
standing of the relationship between the
samples. The models can also provide informa-
tion about the variables that are responsible for
these relationships. Visualization tools are
mandatory to assess the interpretability and the
usefulness of the model with respect to the data
at hand. Some of the most common methods
are described in the following subsections.
DATA MODELING
The standard output of processed raw data
from untargeted metabolomic experiments is
an aligned data table in which each row corre-
sponds to an observation that is described by
Search WWH ::




Custom Search