Information Technology Reference
In-Depth Information
document analysis. The more important change that was introduced enables
to train fast high-dimensional maps. The topological map that is used for
Websom is composed by 1,002,240 neurons. It is impossible to train this map
because the connection number is too large: 1,002,240
500. The new idea
relies on the simple fact that a good initialization considerably increases the
convergence speed. This good initialization is found through a hierarchical
procedure that enables to guide the training from one step to the next one. In
the Websom implementation, the parameters are tuned using a first rectan-
gular map of 435 neurons. This first map is extracted form the learning basis.
Then a second map that uses a finer sampling is initialized using the results of
the first one: the initial values of a parameter of the second map are obtained
through an interpolation of the values of the three closest neighbors extracted
from the 435 neurons of the first map. In that way, the number of neurons
increases from step to step up to 1,002,240 neurons. For each step, there is a
new learning phase of the whole corpus. The initial learning phase (for the 435
neurons of the first map) requires 300,000 iterations; every further learning
phase only requires five iterations of the “dynamical clouds” versions of the
algorithms. In such a way, it is possible to train very large maps. Moreover,
the hierarchical order that was found in the previous steps is used to find the
closest neighbors during the successive learning steps.
×
7.5.3.3 Discussing Websom Performances
The various improvements, which were implemented in Websom are quite ef-
ficient with respect to the time complexity of the computation. The method-
ology that was previously detailed allows reducing the number of opera-
tions from O( dN 2 ) for the original Kohonen algorithm to O(dN 2 )O(dM 2 )+
O(dN) +O(M 2 ) for Websom. In that expression, N is the number of neurons
in the actual map, M is the number of neurons in the initial map and d is
the dimension of the input layer ( d = 500 for Websom). The comparisons
that were achieved with the original Kohonen methodology show that the last
version of the implementation has the same performances than the original al-
gorithm with respect to the quantization error and the classification error. The
final version of the map was obtained through a six-week learning phase that
was performed on a six-processor computer (SGI O2000). The performances
over the seven millions text basis go up to 64% of correct classification. As
it is generally the case for data mining applications, the interface was very
carefully designed : the map is presented as a sequence of HTML pages. It
is easy to explore it by using the mouse. A simple click enables to reach the
documents and then to visualize and to read them.
References
1. Anouar F., Badran F., Thiria S. [1997], Self Organized Map, A Probabilistic
Approach, Proceedings of the Workshop on Self-Organized Maps , Helsinki Uni-
versity of Technology, Espoo, Finlande, 4-6 juin 1997
Search WWH ::




Custom Search