Graphics Reference
In-Depth Information
The basic idea is to find a set of linear transformations of the original variables
which could describe most of the variance using a relatively fewer number of vari-
ables. Hence, it searches for kn -dimensional orthogonal vectors that can best repre-
sent the data, where k
n . The new set of attributes are derived in a decreasing order
of contribution, letting the first obtained variable, the one called principal component
contain the largest proportion of the variance of the original data set. Unlike FS, PCA
allows the combination of the essence of original attributes to form a new smaller
subset of attributes.
The usual procedure is to keep only the first few principal components that may
contain 95%or more of the variance of the original data set. PCA is particularly useful
when there are too many independent variables and they show high correlation.
The basic procedure is as follows:
To normalize the input data, equalizing the ranges among attributes.
To compute k orthonormal vectors to provide a basis for the normalized input
data. These vectors point to a direction that is perpendicular to the others and
are called principal components . The original data is in linear combination of the
principal components. In order to calculate them, the eigenvalue-eigenvectors of
the covariance matrix from the sample data are needed.
To sort the principal components according to their strength, given by their asso-
ciated eigenvalues. The principal components serve as a new set of axes for the
data, adjusted according the variance of the original data. In Fig. 6.1 ,weshowan
illustrative example of the first two principal components for a given data set.
Fig. 6.1 PCA. X and Y are the first two principal components obtained
 
Search WWH ::




Custom Search