Game Development Reference
In-Depth Information
Fig. 4.3 Co-occurrence of terms not always corresponds with their semantic relatedness
ratio were defined as r s
p t for
target-to-source). The distribution of the measured values is plotted in Fig. 4.4 .
We then computed the noise level through co-occurrence of terms of nonsense
pairs. We generated the nonsense pairs three times, to create three reference sets by
randomly selecting them from three corpora (word sets). Each reference set contained
200 generated non-sense termpairs. The corpora contained top 800, 5,000 and 50,000
most frequently used English words (excluding stopwords). For each reference set,
the co-occurrences were computed separately. Instead of one draw, we used three,
because we expected, that term co-occurrence may simply depend also on term
usage frequency. The distribution of the measured values for all three sets is plotted
in Fig. 4.5 . We can observe, how term frequency affects the distribution—while the
broadest set (50,000 words), full of specific terms, scarcely crosses the 0.10 co-
occurrence value, the narrowest (800) reaches even 0.5.
For final comparisonwith LSGnetwork co-occurrences we used themedium sized
(5,000 word) set (covering all of the terms in the LSG network). The noise for this
set starts to take effect from 0.35 co-occurrence. About 40% of LSG term network
relationships falls below this this threshold, rendering them “hidden” in the noise.
We can therefore conclude, that the LSG is able to help discover these relationships.
=
i
/
p s for source-to-target relationship (or r s
=
i
/
4.3.4 Network Relationship Types
The Little SearchGame termnetwork consist of terms and their untyped associations.
Such structure may serve as a base for upgrading to more “heavy” structures: the
 
Search WWH ::




Custom Search