Biology Reference
In-Depth Information
the process. Several warping methods have
been developed, 30 e 32 but two algorithms were
initially proposed: correlation optimized warping
(COW) 33 and dynamic time warping (DTW). 34
The former method maximizes the correlation
between the segments of samples to align and
a reference chromatogram; the latter relies on
dynamic programming to match peaks among
multiple samples. In both cases, several parame-
ters need to be cautiously adjusted, such as the
size of the segments in COW or the gap penalty
in DTW. Such iterative alignment procedures
may be prohibitively time consuming when
dealing with large data sets. Other alternatives
include kernel density estimation to approximate
the distribution of retention times, 27 component-
resolving algorithms, 35 progressive clustering, 36
and retention time shift model
simple strategies such as the unit norm (scaling
to the sum of the total spectrum), median
intensities normalization, 38 and more sophisti-
cated approaches such as cubic spline normaliza-
tion 39 and quantile normalization. 40,41
The latter account for variations related to indi-
vidual metabolites such as heteroscedasticity. 42
As the biological effects related to concentration
changes can greatly differ from one metabolite
to another, the concentration variability of indi-
vidual metabolites
can
fluctuate. Although
a
fine concentration tuning may be required for
a given metabolite, drastic modi
cations can
have very little phenotypic impact for others.
On the other hand, due to analytical reasons,
low concentrations of minor metabolites are
more subject to measurement errors than high-
abundance molecules. Because highly abundant
metabolites are not necessarily the most biologi-
cally relevant, scaling procedures are usually
applied to normalize the variances of the different
metabolites and make them comparable. Indi-
vidual metabolite concentrations are divided by
a scaling factor to ensure their comparability.
Unit variance (UV) and Pareto scaling are the
most widely applied strategies. The standard
deviation is used as the scaling factor for UV,
and the square root of the standard deviation is
used in the Pareto procedure. However, some
evidence suggests that such scaling approaches
may deteriorate the signal-to-noise ratio, leading
to impaired data, 43 and other strategies such as
variance stabilization normalization were
proposed in the context of microarray data anal-
ysis as valuable alternatives. 44 Additionally,
a mathematical transformation can be helpful to
correct skewed data prior to modeling. The log
function constitutes a well-known transforma-
tion applied to correct heteroscedasticity. 41,45
fitting between
each peak list and a master peak list. 37 After
peak alignment, gap
filling is usually applied to
fill missing values when peaks could not be
detected in all samples. This procedure avoids
the inclusion of many zero values that would
have detrimental effects on further datamodeling.
Data Pretreatment: Normalization and
Scaling
Normalization strategies aim to reduce the
effects of undesirable variability sources or
systematic bias and ensure the reliability of
measurements and comparisons between samples
over the whole dynamic range, from highly
concentrated to low-abundance metabolites.
Two main categories can be de
ned: methods
dedicated to correct variations (1) between
samples and (2) between individual metabolites.
The former intends to reduce between-
samples variations that can be due to analytical
noise, experimental bias, biological variability,
or confounding factors (e.g., nutrition or medica-
tion). These procedures are expected to empha-
size differences between experimental groups
(e.g., case vs. control), as biological signals must
be more easily discernible. Such methods include
Software Packages
Several commercial or free software applica-
tions implementing speci
c parts or the whole
procedure of metabolomic data processing
Search WWH ::




Custom Search