Biology Reference
In-Depth Information
that predicts Y values (laboratory data) based onX
(spectral) data. The PLS equation or calibration is
based on decomposing both the X and Y data into
a set of scores and loadings, similar to PCA.
However, the scores for both the X and Y data
are not selected based on the direction of
maximum variation but are selected in order to
maximize the correlation between the scores for
both the X and Y variables. As with PCA, in the
PLS regression development the number of
components or factors is an important practical
consideration. A short description of the PLS algo-
rithm follows; a more detailed discussion of the
PLS algorithm can be found elsewhere. 8,9
Commercial software can used to construct and
optimize both PCA and PLS calibration models.
PLS decomposition of both X and Y data into
scores and loadings is given in the following
equation:
are suitable to relatively dissimilar groups.
PCA-based methods are more sensitive and are
better for similar groups. PLS-DA is more sensi-
tive than PCA-based methods because it will
naturally select scores and loadings that are
important
cation of interest.
PLS-DA is done by using a dummy y variable:
0 for the
for
the classi
first class and 1 for the other. PLS-DA
is a natural method for distinguishing between
healthy and diseased samples. The second case
study in this chapter uses PLS-DA to examine
the NMR spectrum of blood from coronary
patients.
STUDY 1: CANCER DETECTION BY
PROTEOMICS
The detection of cancer early in the disease
course is considered one of the primary factors
determining outcome and is of great clinical
importance. This argument is borne out in partic-
ularly striking terms in the statistics associated
with malignant melanoma prognosis 10 :
X ¼ TP T þ E
(11)
Y ¼ UQ T þ f
The score matrices for X and Yd that is, T and
Ud are calculated together. This self-consistent
approach allows for a set of scores and loadings
that represent the variation in the Y data set.
Therefore, the scores and loadings are much
better than PCA scores and loadings for quanti-
tative prediction. The algorithm proceeds by
mean centering the data and then
90% 10-year survival when diagnosed at stage
I/II
￿
20% to 40% 5-year survival when diagnosed
at stage III
￿ <
￿
10% 5-year survival when diagnosed at
stage IV
finding the
These data should make clear the extreme
necessity of identifying the disease as early as
possible. Although the malignant form of the
disease is fortunately quite rare, nonmelanoma
skin cancers are the most common form of
neoplasm known in man. This considerably
confounds the diagnosis as distinguishing the
two early in the disease progression is very diffi-
rst component
scores. The prediction of a PLS method is
summarized in the regression vector or coeffi-
rst
loading spectrum and
-
cient, B . The predictions are related to the
x
sample data by
y
¼ B,x
(12)
-
cult from a visual examination. Thus, from the
standpoint of biomarkers, the question is as
follows: how can one identify biomarkers of
disease presence in patients with malignant mela-
noma that are both sensitive (candetect the disease
early in disease progression when the tumor
cation of sample groups is an
important issue in MVA. PLS can also be used
to separate and classify two very similar groups;
this variation of PLS is called PLS discriminant
analysis (PLS-DA). 4 Commonly used classi
The classi
ca-
tion methods based on correlation coef
cients
Search WWH ::




Custom Search