Statistical Relational Data Integration for Information Extraction - Reasoning Web - page 256

Databases Reference

In-Depth Information

"

!

$

%

Fig. 1. A small fragment of the web of data. DBpedia is a de-facto hub of the linked

open data could.

2.3 NELL

The never ending language learning [62,13] (NELL) project's objective is the

creation and maintenance of a large-scale machine learning system that learns

to extract structured information from unstructured web pages. NELL distin-

guishes itself from YAGO and DBpedia in that its extraction algorithms operate

on a large corpus of more than 500 million web pages 1 and not solely on the set

of Wikipedia articles. The NELL system was bootstrapped with a small set of

classes and relations and, for each of those, 10-15 positive and negative instances.

The guiding principle of NELL is to build several semi-supervised machine learn-

ing [14] components that accumulate instances of the classes and relations, re-

train the machine learning algorithms with these instances as training data, and

re-apply the machine learning algorithms to extract novel instances. This pro-

cess is repeated indefinitely with each re-training and extraction phase called

an iteration. Since numerous extraction components work in parallel, extracting

facts with different degrees of confidence in their correctness, one of the most

important aspects of NELL is its ability to combine these different extraction

algorithms into one coherent model. This is also accomplished with relatively

simple linear machine learning algorithms that weigh the different components

based on their past accuracy.

NELL's algorithm have been running since 2010, initially fully automated and

without any human supervision. Since it has experienced concepts drift for some

of its relations and classes, that is, a increasingly worse extraction performance

over time NELL now is given some corrections by humans to avoid this long-term

behavior.

1 http://lemurproject.org/clueweb09/

Next Page

Search WWH ::

Custom Search

Home