Databases Reference
In-Depth Information
"
!
$
%
Fig. 1. A small fragment of the web of data. DBpedia is a de-facto hub of the linked
open data could.
2.3 NELL
The never ending language learning [62,13] (NELL) project's objective is the
creation and maintenance of a large-scale machine learning system that learns
to extract structured information from unstructured web pages. NELL distin-
guishes itself from YAGO and DBpedia in that its extraction algorithms operate
on a large corpus of more than 500 million web pages 1 and not solely on the set
of Wikipedia articles. The NELL system was bootstrapped with a small set of
classes and relations and, for each of those, 10-15 positive and negative instances.
The guiding principle of NELL is to build several semi-supervised machine learn-
ing [14] components that accumulate instances of the classes and relations, re-
train the machine learning algorithms with these instances as training data, and
re-apply the machine learning algorithms to extract novel instances. This pro-
cess is repeated indefinitely with each re-training and extraction phase called
an iteration. Since numerous extraction components work in parallel, extracting
facts with different degrees of confidence in their correctness, one of the most
important aspects of NELL is its ability to combine these different extraction
algorithms into one coherent model. This is also accomplished with relatively
simple linear machine learning algorithms that weigh the different components
based on their past accuracy.
NELL's algorithm have been running since 2010, initially fully automated and
without any human supervision. Since it has experienced concepts drift for some
of its relations and classes, that is, a increasingly worse extraction performance
over time NELL now is given some corrections by humans to avoid this long-term
behavior.
1 http://lemurproject.org/clueweb09/
 
Search WWH ::




Custom Search