Environmental Engineering Reference
In-Depth Information
variables are computed in such a way that the covariance between the two blocks X
and Y is modeled as well. Mathematically, this is achieved by selecting a set of linear
combinations of X variables, w i , i
A , that maximizes the covariance between
the matrix of descriptors X and the matrix of responses Y , under the following
constraints:
=
1
,...,
max w i w i X YY Xw i
w i
subject to
w i
=
1
(3.2)
w i
subject to
w j
=
0for i
=
j
.
The structure of the PLS model is shown below:
TP +
X
=
E
TQ +
Y
=
F
(3.3)
P W
) 1
T
=
XW
(
,
where the A columns of P and Q define the linear combinations of the X and Y
variables modeling their covariance structure. E and F are just model residuals. It
has been shown by [19] that the vectors w , q , t and u are eigenvectors of X YY X ,
Y XX Y , XX YY and YY XX , respectively.
Instead of computing all the latent variables at once, a version of the NIPALS
algorithm was adapted for sequentially computing the PLS latent variables, one at a
time. This algorithm is outlined below, with the starting point being mean-centering
and scaling both X and Y matrices:
1. set u to be one column of Y ;
2. w
X u
u u ;
=
/
w w
3. w
=
w
/(
)
;
w w
4. t
=
Xw
/(
)
;
Y t
t t ;
5. q
=
/
q q
6. u
;
7. continue iterating between 2. and 6. until convergence on t or u ;
8. residual matrix: E
=
Yq
/(
)
tq ;
9. store w , p , t and u in W , P , T and U , respectively;
10. calculate next dimensions by returning to 1, using E and F .
as the new X and Y
tp , F
=
=
X
Y
3.2.3 Statistics and Diagnostic Tools Used With Latent Variable
Models
This section presents some statistics and diagnostic tools used in this chapter to
assess the performance of the latent variable models and for interpreting them. In
particular, the scaling of the data matrices, the selection of the number of compo-
nents by cross-validation, and the distance to the model diagnostic tool are discussed
Search WWH ::




Custom Search