Information Technology Reference
In-Depth Information
exists and 0 when they are absent. For example, 180 columns account for the
amino acid contributions (20 aa × 9 positions) while 3200 columns account for the
adjacent side chains, or 1-2 interactions (20 × 20 × 8). As these two models were
roughly equivalent in terms of statistical quality, we applied the principle of
Occam's razor and selected the simplest case, with the amino acids only model, for
discussion in this study.
The matrix was assessed using PLS (Sette et al. 1994a), an extension of Multiple
Linear Regression (MLR). The method works by producing an equation or QSAR,
which relates one or more dependent variables to the values of descriptors and uses
them as predictors of the dependent variables (or biological activity) (Wold 1995).
The IC 50 values (the dependent variable y ) were represented as negative logarithms
(pIC 50 ). The predictive ability of the model was validated using “Leave-One-Out”
Cross-Validation (LOO-CV) method.
4.2.3 Cross-Validation Using the “Leave-One-Out” (LOO-CV) Method
Cross-Validation (CV) is a reliable technique for testing the predictivity of models.
With QSAR analysis in general and PLS methods in particular, CV is a standard
approach to validation. CV works by dividing the dataset into a set of groups, devel-
oping several parallel models from the reduced data with one or more of the groups
excluded, and then predicting the activities of the excluded peptides. When the num-
ber of excluded groups is the same as the number in the set, the technique is called
Leave-One-Out Cross-Validation (LOO-CV). The predictive power of the model is
assessed using the following parameters: cross-validated coefficient ( q 2 ) and the
Standard Error of Prediction (SEP), which are defined in Eqs. (1) and (2).
(
pIC
pIC
)
2
PRESS
50
50
or simply
(1)
(exp)
(
pred
)
q
2
= .
1
q
2
=
1
0
i
=
1
SSQ
(
pIC
pIC
)
2
50
50
(exp)
(
mean
)
i
=
1
where pIC 50 (pred) is a predicted value, pIC 50 (exp) is an actual or experimental value,
and the summations are over the same set of pIC 50 values. PRESS is the PRedictive
Error Sum of Squares and SSQ is the Sum of Squares of pIC 50 (exp) corrected for the
mean.
PRESS
(2)
SEP
=
p
1
where p is the number of the peptides omitted from the dataset. The optimal number
of components (NC) resulting from the LOO-CV is then used in the non-cross-
validated model which was assessed using standard MLR validation terms, explained
by variance r 2 and Standard Error of Estimate (SEE), defined in Eqs. (3) and (4).
 
Search WWH ::




Custom Search