Information Technology Reference
network techniques such as self-organized maps to depict patterns and trends
derived from text. See (Lin 1997 ; Noyons and van Raan 1998 ) for example.
The pioneering software for concept mapping is Leximappe, developed in 1980s.
It organizes a network of concepts based on associations determined by the co-word
method. In 1980s, it was Leximappe that had turned co-word analysis into an
instrumental tool for social scientists to carry out numerous studies originated from
the famous the actor-network theory (ANT).
Key concepts in Leximappe include poles and their position in concept maps. The
position of the poles is determined by centrality and density. The centrality implies
the capacity of structuring; the density reflects the internal coherence of the pole.
Leximappe is used to create structured graphic representations of concept net-
works. In such networks, vertices represent concepts; the strength of the connection
between two vertices reflects the strength of their co-occurrence. In the early days,
an important step was to tag all words in the text as a noun, a verb, or an adjective.
Algorithms used in information visualization systems such as ThemeScape (Wise
et al. 1995 ) have demonstrated some promising capabilities of filtering out nouns
from the source text.
Inclusion Index and Inclusion Maps
Inclusion maps and proximity maps are two types of concept maps resulted from
co-word analysis. Co-word analysis measures the degree of inclusion and proximity
between keywords in scientific documents and draws maps of scientific areas
automatically in inclusion maps and proximity maps, respectively.
Metrics for co-word analysis have been extensively studied. Given a corpus of
N documents, each document is indexed by a set of unique terms that can occur in
multiple documents. If two terms, t i and t j , appear together in a single document,
it counts as a co-occurrence. Let c k be the number of occurrences of term t k in
the corpus and c ij be the number of co-occurrences of terms t i and t j , which is the
number of documents indexed by both terms. The inclusion index I ij is essentially a
conditional probability. Given the occurrence of one term, it measures the likelihood
of finding another term in documents of the corpus:
I ij D c ij =min c i ; c j
For example, Robert Stevenson's Treasure Island has a total of 34 chapters.
Among them the word map occurred in 5 chapters, c map D 5, and the word treasure
occurred 20 chapters, c treasure D 20. The two terms co-occur in 4 chapters, thus
c map, treasure D 4. I map, treasure D 4/5 D 0.8. In this way, we can construct an inclusion
matrix of terms based on their co-occurrence. This matrix defines a network. An
interesting step described in the original version of co-word analysis is to remove
certain types of links from this network.