Environmental Engineering Reference
In-Depth Information
SS tot is the total sum of squares of the original data (before the model is built). As
R 2 reaches values close to one, a very good fit of the data is obtained. The fit is poor
when R 2 values approach zero.
3.2.3.3 Distance to the Model Statistic
The distance to the model is an essential tool in the interpretation of latent variable
models since it allow one to verify how well each observation in the dataset projects
onto the reduced space of the model (plane or hyperplane). This is useful to identify
outliers and the presence of a different correlation structure in specific observations.
The distance of an observation (measurement on each variables) in the X space,
x i , from the PCA or PLS model is given by the square prediction error of this obser-
vation, defined as SPE i
) (
=(
x i
x i
x i
x i
)
,where x i is the projection of x i onto
t i P . SPE i is therefore a measure of
the perpendicular distance of the observation x i from the plane defined by the PCA
model with A latent variables. Note that we have chosen the distance in X space,
but the equations are the same for the distance in Y space, just by replacing x i by y i
and x i
the reduced space of the model (plane): x i
=
t i Q . When the distance is large, this indicates that observa-
tion x i has a different correlation structure from that normally seen in the historical
database used to build the PCA model ( i.e. , not well captured by the model) and
should be investigated closely.
Another very similar measure of perpendicular distance to the model is the
DMOD statistic [21]. This is just a normalized SPE .Forthe i th observation in the
X space, the distance is computed as DMODX i
t i P by y i
=
=
= SPE i
,where J is the
number of variables in X and A is the number of latent variables. Note that DMODY i
can be computed similarly. This measure of perpendicular distance to the model is
shown here since it is used in some commercial software packages ( e.g. , SIMCA-P,
Umetrics Inc.).
Statistical upper limits for the SPE and DMOD statistics can also be used to
establish a threshold for discriminating normal/abnormal data. These involve refer-
ence or theoretical ( i.e. , F) distributions [21, 24].
/(
J
A
)
3.3 Nature of Multivariate Digital Images
A digital multivariate image is a stack of congruent univariate images taken at var-
ious wavelengths as shown in Figure 3.3. It consists of a three-way array of data X
( x
λ) having two spatial directions ( i.e. , x and y ), together defining the spatial
resolution of the image ( i.e. number of pixels), and a third direction λ, referred to
as the spectral direction, corresponding to light intensities captured by the camera
CCD (charge-coupled device) at different wavelengths (or spectral channels). For
example, the RGB color image of froth shown in Figure 3.3 has a spatial resolution
of 516
×
y
×
×
346 pixels, which yields an array of data of dimensions 516
×
346
×
3.
Search WWH ::




Custom Search