Biology Reference
In-Depth Information
the large number of endogenous metabolites,
their unknown identities, and the cost and avail-
ability of isotopically labeled standards. 48,49 A
small set of internal standards may be chosen
and they should preferentially be of different
chemical reactivities to better correct for the vari-
ations in derivatization reactions. 47
In summary, quality control is instrumental in
metabonomics to ensure quality data and cred-
ible results.
and can be caused by reasons such as (1) missed
peak during peak identi
cation (peak present in
sample), (2) peak intensity below threshold
(peak present in sample), and (3) genuine absence
of peak in sample. 55 Because many multivariate
analyses require a complete data set for effective
analysis, the problem of missing values must be
suitably managed. Several strategies to estimate
missing values have been proposed. A simplest
method involves replacing the missing value by
the mean or median of the metabolite level across
the remaining samples. 47,56 Another approach is
to replace the missing value by the mean or
median of the k nearest neighbors (other samples
in the same group for grouped data). 56 If the
extent of missing values is extensive (e.g., in
more than 80% of samples), a potential approach
is to
Retention Index Markers
In GC/MS analysis, retention time
uctuates
due to trimming of capillary columns during
routine maintenance, installation of new column,
or analytical drift especially during large-scale
analysis. RI are the retention times of analytes
relative to the adjacently eluting n-alkanes ana-
lyzed under the same chromatographic condi-
tion. 53 Compared to retention time which is easily
affected by factors such as column length, acqui-
sition delay, and temperature gradient, RI are
relatively stable andwidely used in GC/MS anal-
ysis. TodetermineRI, n-alkanes canbe spiked into
each sample 19 to compensate for retention time
drift over time or alternatively, the alkanemixture
can be analyzed separately. 20 Each alkane peak
is given a RI value that is 100 times of its carbon
number and the RI for each metabolic feature is
interpolated based on the RI values of its bracket-
ing alkanes. Apart from alkane standards, fatty
acid methyl esters (FAME) have also been used
as RI markers. FiehnLib GC/MS libraries are
developed based on the FAME retention index
system. 54 When utilizing databases or libraries
for RI matching, scientists are reminded to check
the column chemistry, as RI is dependent on the
type of stationary phase.
fill the missing values with half of the
detection limit or a value lower than the lowest
value of that peak across all the groups. 57 If the
values are missing in the entire group, replacing
those values by a value lower than the lowest
value in the data matrix is a viable option. In
some studies, 47,55 peaks with more than 20%
missing values were removed from further anal-
ysis but at a cost of losing valuable information
from the remaining entries related to that meta-
bolic feature. 56 In our group, we adopted another
approach using the calibration feature in Chro-
maTOF software in which each peak in the data
set is manually checked for proper peak align-
ment and integration. 20 In this process, missing
values due to problems from peak peaking and
retention time drift can be promptly corrected.
Albeit being a tedious and time-consuming
process, this method provides excellent assurance
of the quality of the data for subsequent analysis.
Normalization is a crucial step in the prepa-
ration of data for analysis to remove systematic
variation in the data due to analytical variation,
source contamination, column degradation, or
metabolite dilution in urine due to varying
water intake by subjects. Hence, normaliza-
tion enables true biological variation to be
observed. 52 Normalization can be performed
Managing Missing Values and
Normalization
Missing values in the data matrix are common
phenomena
in metabonomic measurements
Search WWH ::




Custom Search