Information Technology Reference
In-Depth Information
The bunch of results that were obtained in that application demonstrates
that topological maps are correctly operating when they are used to process
numerical data.
The following application is due to T.Kohonen. It shows that the algorithm
is performing well when it is used for textual processing.
7.5.3 Topological Map and Documentary Research
This last paragraph presents another real-world application in a field that
is completely different: documentary research. The general objective of the
Websom system that was created by Kohonen and his collaborators is to cre-
ate a content-based labeling of a set of texts. The current working version
allows organizing 7,000,000 texts in a single documentary data basis. Thus
documents with close-by semantics are endowed with neighbor label. A visual
inspection of the basis representation provides a global hint about the content
of the documents that are stored in a particular zone of the basis. Looking
for the keywords that are associated to the zone and considering the topics of
the different documents allow document searching in an original way. Consid-
ering his short description of Websom system's main characteristics, one feels
how self-organizing maps are used: semantically close observations (texts) are
allocated to neighbor neurons on the map. In order for the application to be
operational, several additional properties have to be checked:
As for the remote-sensing satellite data application, the quality of the
system depends closely on the semantics of the texts of interest.
Documentary research is useful only if the number of stored texts is large
enough and if the visualization is fine enough. Thus the dimension of the
map has to be very high.
The system is supposed to be operated on-line, thus it has to work fast.
The basis algorithms has to be changed to allow
1. introducing a linguistic knowledge that enables textual manipulation,
2. training high-dimensional maps to be able to process as many docu-
ments as possible,
3. using a friendly interface which really helps the user to perform docu-
ment research,
4. reducing the duration of an average research session.
7.5.3.1 Information Coding
When a text is preprocessed significant information is extracted that depends
on the specificities of the general field of the research. Of course, the encoding
has to be made according the specifications of the topological maps: Kohonen's
algorithm is processing numerical multidimensional data. Thus any text has
to be represented by an n -dimensional numerical vector. The current version
of Websom system is processing a corpus that contains 6,840,568 English
Search WWH ::




Custom Search