Introduction to Linked Data and Its Lifecycle on the Web - Reasoning Web

Databases Reference

In-Depth Information

When training examples are available, the methods of choice are borrowed from

supervised machine learning. Approaches such as Hidden Markov Models [168], Max-

imum Entropy Models [35] and Conditional Random Fields [45] have been applied

to the NER task. Due to scarcity of large training corpora as necessitated by machine

learning approaches, semi-supervised [125,105] and unsupervised machine learning ap-

proaches [107,41] have also been used for extracting NER from text. [105] gives an

exhaustive overview of approaches for NER.

Keyphrase Extraction. Keyphrases

Keywords are multi-word units (MWUs) which

capture the main topics of a document. The automatic detection of such MWUs has

been an important task of NLP for decades but due to the very ambiguous defini-

tion of what an appropriate keyword should be, current approaches to the extraction

of keyphrases still display low F-scores [75]. From the point of view of the Semantic

Web, the extraction of keyphrases is a very similar task to that of finding tags for a given

document. Several categories of approaches have been adapted to enable KE, of which

some originate from research areas such as summarization and information retrieval

(IR). Still, according to [74], the majority of the approaches to KE implement combina-

tions of statistical, rule-based or heuristic methods [48,120] on mostly document [97],

keyphrase [149] or term cohesion features [124]. [75] gives a overview of current tools

for KE.

/

Relation Extraction. The extraction of relations from unstructured data builds upon

work for NER and KE to determine the entities between which relations might exist.

Most tools for RE rely on pattern-based approaches. Some early work on pattern extrac-

tion relied on supervised machine learning [51]. Yet, such approaches demanded large

amount of training data, making them di

cult to adapt to new relations. The subse-

quent generation of approaches to RE aimed at bootstrapping patterns based on a small

number of input patterns and instances. For example, [28] presents the Dual Iterative

Pattern Relation Expansion (DIPRE) and applies it to the detection of relations between

authors and titles of topics. This approach relies on a small set of seed patterns to max-

imize the precision of the patterns for a given relation while minimizing their error rate

of the same patterns. Snowball [3] extends DIPRE by a new approach to the generation

of seed tuples. Newer approaches aim to either collect redundancy information from the

whole Web [123] or Wikipedia [158,164] in an unsupervised manner or to use linguistic

analysis [53,119] to harvest generic patterns for relations.

URI Disambiguation. One important problem for the integration of NER tools for

Linked Data is the retrieval of IRIs for the entities to be manipulated. In most cases,

the URIs can be extracted from generic knowledge bases such as DBpedia [104,83]

by comparing the label found in the input data with the rdfs:label or dc:title of

the entities found in the knowledge base. Furthermore, information such as the type of

NEs can be used to filter the retrieved IRIs via a comparison of the rdfs:label of

the rdf:type of the URIs with the name of class of the NEs. Still in many cases (e.g.,

Leipzig, Paris), several entities might bear the same label.

Reasoning Web

Search WWH ::

Custom Search

Home