State-of-the-Art: Semantics Acquisition and Crowdsourcing - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

candidate instances for a class, also given on input. More detailed, the OntoSyphon

worksasfollows:

1. The textual representation of the input class is retrieved from the ontology.

2. Textual phrases and sentences where the input class is used are retrieved from

ontology (if present). Alternatively the ontology neighbors of the input class

(parent classes, siblings in hierarchy, other related classes) are retrieved to form

artificial phrases.

3. These phrases are used as queries for keyword search engine operating over the

given document corpus (here, the whole Web can be easily used).

4. The search engine retrieves a set of documents to be mined. Because of the phrase

use, only documents with proper termmeanings are retrieved. Without the phrase

search, i.e. with only a keyword search with textual representation of the input

class, the system might encounter polysemy problems. If, for example, there was

a class “sea”, by querying it, we would retrieve documents about sea as a part of

ocean, but also about SEA information system. But if the phrase is derived out of

the existing ontology (e.g., “sea ship”) and used as a query, much more coherent

set of documents with a proper term meaning usage would be retrieved.

5. Finally, using the predefined set of sentence templates (e.g., “A is a B” or “A such

as B, C, D”), the OntoSyphon matches the texts of the retrieved documents for

expressions of the hierarchical subordination of the named entities, with input

class being the superior entity. The other participating entities are afterward writ-

ten to a instance candidate list.

2.4.2 Relationship Discovery and Naming

Another group of automated semantics acquisition approaches orients on the dis-

covery relationships between entities. The entities can be anything from the simple

terms to refined ontology concepts. In all cases, textual representations of entities are

sought in the textual resources and subsequently their relationships are mined. The

factual statements are often contained within the single sentence as subject, object

(nouns, adjectives) and predicate (verbs), so many approaches focus on mining the

sentences for term relationships [ 51 , 60 , 71 ]. Others try to exploit structures like

tables and lists to access the relationship expressed through them [ 15 ].

An example of relationship harvesting was presented by Pantel and Pennacchiotti

[ 51 ]. Their approach implemented a bootstrapping technique, which is, when sup-

plied by few examples, able to harvest quality relationships from the natural language

text corpus, even the whole Web. The approach is predicate-oriented: it primarily

looks for relationship (predicate) occurrence in the corpus and only afterward, it

attaches the subjects and objects to it. The method works as follows:

At start-up a small set of seed expressions of the same relationship is chosen,

e.g., “part of”, “consists of”, “comprises”. Its generic pattern is created to cover

variations of the expression, e.g., “X of Y”.

Search WWH ::

Custom Search

Home