State-of-the-Art: Semantics Acquisition and Crowdsourcing - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

constraints or literal attributes absent in its structures. This, naturally, reduces the

options of its utilization. On the other hand, its upkeep becomes much more cheaper.

Both Cyc and WordNet are examples of originally “old” (1980) initiatives of

domain modeling efforts surviving to this day. They have been created before the

birth of theWeb. When the idea of SemanticWeb emerged, it firstly plead for creation

of yet another all-covering (web) world model. However, it soon became apparent

that such knowledge base could not be maintained centrally.

This problem was answered with Linked Data initiative. The Linked (Open) Data

represent a system of interlinked resources, facts and vocabularies grouped into

ontologies, each specialized to a specific domain [ 11 ]. Linked Data are, in general,

lightweight: their common knowledge representation framework is RDF. This is

one of reasons of Linked Data proliferation: the contribution of knowledge to such

structure is easier than with heavyweight ontologies. Smaller specialized domain

models are easier to maintain. The individual ontologies of the Linked Data overlap

which yields a plethora of equivalence relationships between them. Linked Data also

incorporated older knowledge bases and reached almost universal recognition in the

community as a de-facto central entity of the today's Semantic Web.

2.4 Automated Approaches

Automated approaches to semantics acquisition rely on extraction of facts out of

existing (electronic) human-readable knowledge bases. They have been subjects

to many research activities, mainly because they do not rely on cooperation with

human contributors which are problematically motivated (for cutting them off, these

approaches providemuch better scalability). Automated approaches can be seen from

several points of view:

The source corpora and domain . The corpus that can be mined can be the whole

Web, or it's subset. It can also be a closed repository of documents (usually related

to some domain). Generally, reduction of the input corpus naturally influences the

quantity (and also quality) of acquired facts, helps in dealingwith the heterogeneity

of the resources and brings possibilities to exploit repetitive structures established

within the corpus (e.g., reducing corpus to Wikipedia brings the advantages of the

infoboxes, which contain structured data).

The degree of supervision , or amount of expert knowledge needed to fuel the

process. The typical example of supervised approach is a text mining algorithm,

looking for occurrence of certain predefined phrase pattern(s) (e.g., “such as”). On

the other side, an unsupervised approach example is the latent semantic analysis

of texts (mining frequent term collocations). In general, supervised approaches

usually provide better precision, while the unsupervised ones may process more

heterogeneous inputs with unexpected situations.

The type of job they do. In ontology building, approaches focus on concept

identification, concept instance discovery or on relationship discovery which is

Search WWH ::

Custom Search

Home