Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

document analysis. The more important change that was introduced enables

to train fast high-dimensional maps. The topological map that is used for

Websom is composed by 1,002,240 neurons. It is impossible to train this map

because the connection number is too large: 1,002,240

500. The new idea

relies on the simple fact that a good initialization considerably increases the

convergence speed. This good initialization is found through a hierarchical

procedure that enables to guide the training from one step to the next one. In

the Websom implementation, the parameters are tuned using a first rectan-

gular map of 435 neurons. This first map is extracted form the learning basis.

Then a second map that uses a finer sampling is initialized using the results of

the first one: the initial values of a parameter of the second map are obtained

through an interpolation of the values of the three closest neighbors extracted

from the 435 neurons of the first map. In that way, the number of neurons

increases from step to step up to 1,002,240 neurons. For each step, there is a

new learning phase of the whole corpus. The initial learning phase (for the 435

neurons of the first map) requires 300,000 iterations; every further learning

phase only requires five iterations of the “dynamical clouds” versions of the

algorithms. In such a way, it is possible to train very large maps. Moreover,

the hierarchical order that was found in the previous steps is used to find the

closest neighbors during the successive learning steps.

×

7.5.3.3 Discussing Websom Performances

The various improvements, which were implemented in Websom are quite ef-

ficient with respect to the time complexity of the computation. The method-

ology that was previously detailed allows reducing the number of opera-

tions from O( dN 2 ) for the original Kohonen algorithm to O(dN 2 )O(dM 2 )+

O(dN) +O(M 2 ) for Websom. In that expression, N is the number of neurons

in the actual map, M is the number of neurons in the initial map and d is

the dimension of the input layer ( d = 500 for Websom). The comparisons

that were achieved with the original Kohonen methodology show that the last

version of the implementation has the same performances than the original al-

gorithm with respect to the quantization error and the classification error. The

final version of the map was obtained through a six-week learning phase that

was performed on a six-processor computer (SGI O2000). The performances

over the seven millions text basis go up to 64% of correct classification. As

it is generally the case for data mining applications, the interface was very

carefully designed : the map is presented as a sequence of HTML pages. It

is easy to explore it by using the mouse. A simple click enables to reach the

documents and then to visualize and to read them.

References

1. Anouar F., Badran F., Thiria S. [1997], Self Organized Map, A Probabilistic

Approach, Proceedings of the Workshop on Self-Organized Maps , Helsinki Uni-

versity of Technology, Espoo, Finlande, 4-6 juin 1997

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home