Information Technology Reference
In-Depth Information
by asking hundreds of human volunteers to provide the first word that comes to
mind when given a cue word. This technique is able to capture many different
types of word associations including word co-ordination (pepper, salt), collocation
(trash, can), super-ordination (insect, butterfly), synonymy (starving, hungry), and
antonymy (good, bad). The association strength between two words is simply a count
of the number of volunteers that said the second word given the first word. FANs are
considered to be one of the best methods for understanding how people, in general,
associate words in their own minds [ 23 ].
For the corpus-based associations, we build a (term
term) co-occurrence matrix
from a large corpus, in a manner similar to that employed in the Hyperspace Analog
to Language (HAL) model [ 24 ]. For our corpus, we use the entire (English) text
of Wikipedia, as it is large, easily accessible, and covers a wide range of human
knowledge [ 25 ]. Once the co-occurrence matrix is built, we use the co-occurrence
values themselves as association strengths between words. This approach works,
since we only care about the strongest associations between words, and it allows us
to reduce the number of irrelevant associations by ignoring any word pairs with a
co-occurrence count less than some threshold.
Our final semantic network is a composition of the human- and corpus-based
associations, which essentially merges the two separate graphs into a single network
before querying it for associations. This method assumes that the human data contains
more valuable word associations than the corpus data because such human data is
typically used as the gold standard in the literature. However, the corpus data does
contain some valuable associations not present in the human data. To combine the
graphs, we add the top n associations for each word from the corpus data to the human
data but weight the corpus-based association strengths lower than the human-based
associations. This is beneficial for two reasons. First, if there are any associations that
overlap, adding them again will strengthen the association in the combined network.
Second, corpus-based associations not present in the human data will be added to
the combined network and provide a greater variety of word associations. We keep
the association strength low because we want the corpus data to reinforce, but not
dominate, the human data.
×
4.2.1.2 Image Generation
DARCI generates images in two stages: the creation of a source image composed of
a collage of concept icons and the rendering of this source image using various para-
meterized image filters. The collage generation is driven by the semantic network,
while the filtered rendering is achieved using an evolutionary mechanism whose
fitness function is defined in terms of the outputs of the visuo-linguistic association
networks.
Image Composition . The semantic memory model can be considered to represent
the meaning of a word as a (weighted) collection of other words. DARCI effectively
makes use of this collection as a decomposition of a (high-level) concept into simpler
Search WWH ::




Custom Search