Graphics Programs Reference
In-Depth Information
thogonal coordinate system, where the fi rst axis passes through the long axis
of the data scatter and the new origin is the bivariate mean. This new refer-
ence frame has the advantage that the fi rst axis can be used to describe most
of the variance, while the second axis contributes only a little. Originally,
two axis were needed to describe the data set prior to the transformation. It
is therefore possible to reduce the data dimension by dropping the second
axis without losing much information as shown in Figure 9.1.
This is now expanded to an arbitrary number of variables and samples.
Suppose a data set of measurements of p parameters on n samples stored in
an n -by- p array.
The columns of the array represent the p variables, the rows represent the n
samples. After rotating the axis and moving the origin, the new coordinates
can be computed by
The PC 1 denoted by Y 1 contains the greatest variance, PC 2 the second high-
est variance and so forth. All PCs together contain the full variance of the
data set. The variance is concentrated in the fi rst few PCs, which explain
most of the information content of the data set. The last PCs are generally
ignored to reduce the data dimension. The factors a ij in the above equations
are the principal component loads . The values of these factors represent the
relative contribution of the original variables to the new PCs. If the load a ij
of a variable X 1 in PC 1 is close to zero, the infl uence of this variable is low.
A high positive or negative a ij suggest a strong contribution of the variable
X 1 . The new values of the variables computed from the linear combinations
of the original variables weighted by the loads are called the principal com-
ponent scores .
In the following, a synthetic data set is used to illustrate the use of the func-
tion princomp contained in the Statistics Toolbox. Our data set contains the
Search WWH ::




Custom Search