State-of-the-Art: Semantics Acquisition and Crowdsourcing - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

number of document covered, universal versus domain specific applicability) and

quality (tolerance to bias and errors, structural degree) of information and knowl-

edge they are able to acquire. Using these perspectives, we observe a generally high

quality and low quantity results of expert based approaches that are bound to the lim-

ited manpower. On the other hand, automated approaches deliver semantics in high

quantities but with unsure quality, since they are prone to unusual situations sourcing

from the heterogeneity of spaces they aim to cover. The crowd-based approaches are

somewhere in between, operating with numerous, yet lay mass of human contribu-

tors. They have potential for both quality and quantity, but are limited by specificness

of the task they aim to fulfill. They also need to motivate the contributors the right

way, which is also limiting. These (but not only these) issues make the field of

crowdsourcing a target for researchers.

Some researchers argue there is no other way to create accurate domain models

and annotations, than to utilize manpower, others argue that virtually any piece of

knowledge is already on the Web, probably with great redundancy and it is only

a matter of developing of the ultimate harvesting algorithm to collect it [ 18 ].

For now, the best way toward acquisition of semantics lie in combining approach

families together to exploit strong points and neutralize weaknesses. As an example

of approach chaining, we can imagine a ontology engineering project where experts

firstly set top layers of the taxonomy within the ontology, set up the axioms and

entity and relationship types and seed the examples. After this, an automated method

is deployed over the corresponding text resource corpus and extracts entities and

relationships according to patterns (previously set by expert). Lastly, the crowd comes

in to validate the acquired entities and relationships using a simple true/false question

answering interface. As another example of symbiosis, we can consider a crowd that

prepares image tags for images prior to the automated classifier training.

Considering this, we come to two possible roles of the crowd: semantics creation

or semantics validation. Whether the crowd is supposed to carry out first or the latter,

greatly influences the options the method designer has. Naturally, a “validation”

crowdsourcing always depends on an existing metadata set it aims to improve. On

the other hand it has a great advantage regarding the design of the contributor's

interface with the crowdsourcing platform: validating something is in general more

ergonomic than creating (both syntactically and semantically). In the context of

the first example, a dichotomous question answering about the validity of a typed

relationship between two terms is syntactically easier than selecting the type from a

long list. This somewhat advocates the use of crowdsourcing for semantic validation

rather creation, especially if the automated method that creates the metadata is able

to state its confidence (support) about its output, limiting the metadata set that needs

to be validated to only “unsure” cases.

The type of the resource for which the semantics is created also indicates the

potential outcome of the acquisition method. For structured and unstructured texts,

automated approaches function better if only lightweight structures are demanded

(e.g., keywords), whereas experts or crowds are needed, if the semantics (especially

domain models) is required on a higher quality grade. With multimedia, the human

work is even more demanded in semantics creation. For our research presented in

Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Search WWH ::

Custom Search

Home