Information Technology Reference
In-Depth Information
Given the metrics used to compute distances, and for the same reasons
as for PCA, the adapted preprocessing operations consists in a reduction of
each component in order to standardize their importance in the computation
of the distances. Although that is not mandatory, data may also be centered
in order to obtain graphic representations around the origin.
As for Kohonen maps, the components y ij of units in the output space, are
initialized to random values. To standardize their distribution, each compo-
nent may be uniformly distributed in [
1 , 1]. Given the computations of the
euclidean distances X ij and Y ij evaluated respectively in spaces of different
dimensions, p and q , the comparison of distances is biased. To overcome that
problem, especially for high dimension reduction rates, the recommended rule
consists in assessing average distances while taking into account the dimen-
sions of the spaces:
k =1 ( x ik
k =1 ( y ik
x jk ) 2
y jk ) 2
X ij =
,
ij =
.
p
q
The selection of parameter ρ has a large impact on the quality of the projec-
tion. During the first iterations, all points y i in output space should contribute
to the cost function. The rule consists in initializing parameter ρ to the max-
imum of distances Y ij ,
ρ (0) = max
ij
Y ij .
The final value of ρ should correspond to the smallest value required on Y ij ,
i.e., to the smallest of values X ij ,
ρ ( t max )=min
ij
X ij .
Parameter decreases according to a law that depends on the number of iter-
ations t from the initial value ρ (0) to the final value ρ ( t max ),
ρ ( t )= ρ (0) ρ ( t max )
ρ (0)
t/t max .
3.5.4 Quality of the Projection
One of the important aspects of curvilinear analysis is the criterion used to
assess the quality of the result. That criterion is based on the comparison of the
values X ij and Y ij that correspond to the distances between points, computed
in the original space and in the reduced space respectively. The distances are
represented in a plane dx
dy by a point of coordinates dx = Y ij and dy = X ij .
The points close to the line dx = dy correspond to neighboring distances. The
distortion due to the dimensionality reduction is therefore proportional to the
average distance from the points to the straight line dx = dy . Figure 3.8 shows
the average distribution of the distances for the example of the hemisphere
and for that of the sphere.
Search WWH ::




Custom Search