Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

output space, distances greater than ρ will no longer be taken into account.

The decrease in parameter ρ during training allows the opening, and possibly

the breaking, of certain nonlinear varieties. The projection of a sphere R 3 in

R 2 (Fig. 3.4) shows an example of a variety for which the projection requires a

breaking. The function is therefore used to open certain varieties by retaining

the local topology as far as possible.

Therefore, the objective function minimized by CCA takes the following

form:

Y ij ) 2 F ( Y ij ,ρ ) .

E =

( X ij −

i =1

j = i +1

3.5.2 Curvilinear Component Analysis Algorithm

The algorithm consists in minimizing the above cost function with respect to

the coordinates of each point in the database in reduced space. As for learning,

we may use any of the optimization algorithms given in Chap. 2. Training can

be performed by any minimization algorithm, as described in Chap. 2. For

illustration, we describe the minimization of the cost function by stochastic

gradient.

Thus, we compute the partial derivatives of the cost function with respect

to each parameter; we denote by y ik

the k −i th coordinate of point i ,

j = i

∂E

∂y ik

∂E

∂Y ij

∂y ik

X ij −

Y ij

Y ij ) F ( Y ij )]( y ik −

−

[2 F ( Y ij )

−

( X ij −

y jk ) .

Y ij

j = i

Parameters are updated as follows, where µ is the gradient step:

∆ y i = µ

j = i

X ij −

Y ij

Y ij ) F ( Y ij )] ( y i −

[2 F ( Y ij )

−

( X ij −

y j ) .

Y ij

A condition should be provided to guarantee the convergence of the minimiza-

tion. The term β ij =2 F ( Y ij )

Y ij ) F ( Y ij ) must be positive. If Y ij is

too large with respect to X ij ,point j should be brought closer to point i .The

functions F ( Y ij ) should be selected in order to guarantee β ij > 0. That condi-

tion is di cult to satisfy: for instance, for F ( Y ij )=exp(

−

( X ij −

Y ij /ρ ), the stability

requires ρ> ( Y ij −X ij ) / 2. That condition cannot always be fulfilled because

ρ decreases during training. The following simplification of the training rule

guarantees, almost everywhere, that β ij =2 > 0:

−

⎧

⎨

j = i

X ij −

Y ij

( y i −y j ) f Y ij >ρ ;

Y ij

∆ y i =

⎩

otherwise .

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home