Database Reference
In-Depth Information
We can view PCA as a data-mining technique. The high-dimensional data can be re-
placed by its projection onto the most important axes. These axes are the ones correspond-
ing to the largest eigenvalues. Thus, the original data is approximated by data that has many
fewer dimensions and that summarizes well the original data.
11.2.1
An Illustrative Example
We shall start the exposition with a contrived and simple example. In this example, the data
is two-dimensional, a number of dimensions that is too small to make PCA really useful.
Moreover, the data, shown in Fig. 11.1 has only four points, and they are arranged in a
simple pattern along the 45-degree line to make our calculations easy to follow. That is,
to anticipate the result, the points can best be viewed as lying along the axis that is at a
45-degree angle, with small deviations in the perpendicular direction.
Figure 11.1 Four points in a two-dimensional space
To begin, let us represent the points by a matrix M with four rows - one for each point -
and two columns, corresponding to the x -axis and y -axis. This matrix is
Compute M T M , which is
We may find the eigenvalues of the matrix above by solving the equation
(30 − λ)(30 − λ) − 28 × 28 = 0
as we did in Example 11.2 . The solution is λ = 58 and λ = 2.
Following the same procedure as in Example 11.2 , we must solve
When we multiply out the matrix and vector we get two equations
Search WWH ::




Custom Search