Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Therefore, PCA is a linear projection method that maximizes the inertia of

the scatter diagram.

Before describing the theoretical developments, let us review, as a simple

illustration, the example of the distribution of a scatter diagram in

2 shown

in Fig. 3.1. The first main axis found by PCA is the axis with respect to which

the inertia of the scatter diagram is maximal. The second axis, orthogonal to

the previous one, is the axis with respect to which the inertia of the scatter

diagram, in the null space of the first axis. The other axes are defined similarly.

R

PCA and Gram-Schmidt Orthogonalization

This procedure may be reminiscent of the Gram-Schmidt orthogonalization

described in the previous chapter for the selection of inputs. That analogy,

however, is deceptive. PCA is a procedure that is carried out in representation

space , in which each observation is represented by a point, whose co-ordinates

are the values of the factors that correspond to that observation. By contrast,

Gram-Schmidt orthogonalization for the selection of inputs is carried out in

the observation space , where each factor is represented by a vector, the compo-

nents of which are observations of this factor in the database. The dimension

of representation space is the number of factors of the model, whilst the di-

mension of observation space is the number of observations in the database.

Figure 3.2 shows the 2 main axes defined by the 1st and 2nd bisector

respectively (the orthogonality of the axes is distorted by the scale of the

graph). The main components will be represented by projections of points on

the main axes. Linear transformation by PCA therefore consists in changing

the variables, defined by the main axes, on the centered data.

We will show that the “mechanical” concept of total inertia of the scatter

diagram is equivalent to the “statistical” concept of variance. The inertia

of points is computed with respect to the centre of gravity of the scatter

diagram. We denote by g the centre of gravity and by I n

the inertia of the

R n ,wehave

scatter diagram defined in

N

n

N

1

N

g j ) 2 .

g j =

x ij ⇒

I n =

( x ij −

i =1

j =1

i =1

Inertia I n is therefore equal to the trace of the variance-covariance matrix of

the data X defined by

Ig ) T ( X

V =( X

−

Ig ) ,

where I denotes the identity matrix.

Since inertia is shift-invariant, the data may be centered by X = X

Ig ,

so that one has the following simple relation between the inertia and the

variance-covariance matrix on the new centered data X :

I n =Trace X T X .

−

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home