Biology Reference
In-Depth Information
counterparts, the Mann-Whitney and the
Kruskal-Wallis tests, respectively. 108 The predic-
tive ability of a biomarker can be assessed by
comparing the corresponding p -value of the
test to a given threshold. Such procedures are
easily implemented but remain of limited use
with highly multivariate correlated metabolic
pro
without prediction errors; a random prediction
corresponds to an AUC of 0.5. Speci
c interest
may reside in a given region of the curve, such
as high speci
city, not in the entire range. ROC
curves don
t account for variation amplitude,
and the combination of complementary methods
evaluation such as FDR may be useful. 69,112
'
les at hand.
Multiple hypothesis testing suffers from false
positives (type I error) and adjustments such as
the Bonferroni correction have been introduced
to address this issue. 109 However, such an
approach in
MODEL VALIDATION
Statistical Validation
ates type II error rate (the proba-
bility of accepting the null hypothesis when the
alternative is true), that is, missing the detection
of relevant biomarkers. Evaluating the false
discovery rate constitutes another alternative to
limit the number of false positives. 110 This
method originates from the analysis of microar-
ray data and associates a threshold re
The predictive or clustering ability of a model
is not the only indicator of validity, and its gener-
alization ability is a crucial aspect to consider.
Model validation aims to con
rm the model val-
idity and its capacity to handle new observations
reliably. Common procedures include cross-
validation and permutation tests, and bootstrap
but proper model validation is still subject of
research and debate. 113 The validation should
implicate the prediction of completely new, inde-
pendently generated observations addressing
the same biological question for a reliable
estimation of model generalization ability.
However, such a secondary independent data
set is rarely available in practice. Therefore,
cross-validation d a method of evaluating mod-
els by dividing the data into a training set and
a test set d is often applied. The former is used
to build the model during a training phase and
the latter is predicted as new data to assess
model performance. Iterative procedures such
as leave-one-out or k-fold are usually imple-
mented to assess model validity with respect to
perturbations of the training set. The former
leaves the data from one observation out, builds
a model on the remaining data set, and predicts
the outcome of this observation. The latter
divides the data set in k parts and leaves each
part out iteratively. 114 It depends heavily on
the choice of the observations belonging to
either the training or the test set. The evaluation
of the model can therefore be greatly different
ecting
false positives among all signi
cant hypotheses
to the signi
cance level (q-value).
In the case of discriminant analysis, the
receiver operating characteristic (ROC) curve
constitutes another useful graphical tool to
assess the predictive ability of a biomarker. A
ROC curve is a meaningful indicator of the
ability of a biomarker to discriminate between
two populations (e.g., case and control) in the
case of overlapped values distributions. 111
Performance indices are computed for every
possible threshold value and reported graphi-
cally. Each point of the ROC curve corresponds
to a sensitivity/speci
city couple associated
with a particular decision boundary. The true
positive fraction (sensitivity) is plotted on the
y-axis and the false positive rate (1-speci
city)
on the x-axis. The upper-left corner of the ROC
graph therefore corresponds to perfect predic-
tion ability. The shape of the curve is helpful to
assess the predictive value of a biomarker and
the area under the curve (AUC) is computed to
provide a global index summarizing its discrim-
inant ability. An AUC of 1 (100%) re
ects
a perfect separation of
the two populations
Search WWH ::




Custom Search