Little Search Game: Lightweight Domain Modeling - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

counts. Mass opinion on relevance was set to yes or no if one of the respective

weighted vote counts was at least twice that strong than the other one. The rest of

the pairs was set as controversial and the pair was removed from further evaluation.

Results . The results have shown that nearly 91% of the relationships in the term

network were correct, which encouraged us to further research properties of the Little

Search Game and the created network as well.

4.3.3 Ability to Retrieve “Hidden” Relationships

When considering the purpose of the Little Search Game , one may question a neces-

sity to have a human-computation approach to acquire term relationships, when

we can simply infer the relatedness from term co-occurrence (let us define the

co-occurrence of term A to B as ratio of all documents containing term A and B

to documents containing term A). Unfortunately, statistical co-occurrence of terms

does not necessarily reflect the true semantic relatedness of terms. For example, the

terms “brain” and “tumor”, which are arguably relevant to each other have ten times

lower co-occurrence as nonsense pair “substance—argument” (in the same corpus,

the Web). Many automated approaches to semantics acquisition are threatened by

some level of noise, which need to be corrected manually. In case of co-occurrence,

it renders a subset of valid (semantically sound) term relationships “hidden”, or

indistinguishable from non-valid ones.

Fortunately, the mechanics of the Little Search Game allow to explore even these

“hidden” term relationships (despite the scoring of the game itself is dependent on

the “imprecise” co-occurrence measurement). The key force which achieves this, is

the way how a regular game player thinks: although the he aims to come up with

negative search terms that have high co-occurrence with the task term, he makes

his guesses through the prism of true semantic relatedness. Therefore, he sometimes

enters terms he consider related to task, but later he realizes, they had no effect on

the result count and in next attempts, he uses them no more. However, once he used

them, they remain in the game's logs, and can eventually make it through post-hoc

filtering.

To confirm this hypothesis, we have conducted an experiment examining the term

co-occurrence for relationships present in the LSG term network acquired earlier.

Assuming the correctness of these relationships, we aimed to determine, how many

of them are “hidden”, i.e. are indistinguishable from nonsense relationships by their

co-occurrence in a corpus (i.e. the whole Web, indexed by Bing search engine).

More precisely, howmany have lesser co-occurrence than “noise level” of the corpus

(a co-occurrence value, which significant number of non-sense term relationships are

reaching) (Fig. 4.3 ).

We first used the search engine to compute co-occurrence ratios for all term pairs

in the LSG term network. We queried for number of results p s containing source term

(set A), then number of results p t containing target term (set B) and then the number

of results containing both terms i (intersection of A and B). Then, the co-occurrence

Search WWH ::

Custom Search

Home