Databases Reference
In-Depth Information
integer linear programming. Finally, we discuss some recent advances in the
development of ecient algorithms for probabilistic inference.
2 Information Extraction - The State of the Art
There are numerous information extraction projects each with foci on particular
subproblems of information extraction and knowledge base construction. We
selected several representative projects without making a claim of completeness.
Other IE projects we are aware of and that we are not able to cover here due to
space considerations are Freebase [10] and DeepDive [72].
The following descriptions of the information extraction projects demonstrate
that all use a combination of statistical and logical formalism to extract facts
and to improve the quality of the derived knowledge. Hence, information extrac-
tion projects are prime examples where statistical relational learning and joint
inference proves tremendously useful and is naturally applicable. It is also inter-
esting to observe that many of these projects have strong commonalities despite
their different objectives and premises. The main motivation for presenting the
various approaches to knowledge base extraction is to demonstrate the impor-
tance of methods that combine probability and logic and to excite the reader
with a semantic web background about the data that these projects continu-
ously aggregate. There are numerous research directions for young researchers
to pursue.
2.1 YAGO
YAGO was introduced with the publication [87]. Each entity in YAGO corre-
sponds to an article in Wikipedia. Whenever Wikipedia's volunteer editors deem
an entity worthy of a Wikipedia article, YAGO will create the corresponding en-
tity in its knowledge base. The taxonomic backbone of YAGO is based on a
hierarchy of user-created Wikipedia categories. YAGO establishes links between
Wikipedia categories and synsets in WordNet [28].
YAGO has roughly 100 manually defined relations, such as locatedIn and
hasPopulation. YAGO extracts instances of these relations from Wikipedia in-
foboxes (meta-data boxes). These instances are commonly denoted as facts:
triples of an entity (the subject), a relation (the predicate), and another en-
tity (the object). YAGO utilizes a set of manually created patterns that map
categories and infobox attributes to fact templates. YAGO contains more than
80 million facts involving more than 9 million entities [36].
The YAGO knowledge base also utilizes a set of deterministic and probabilistic
rules. These declarative rules are used to ensure that facts do not contradict each
other in certain ways. For instance, some of these declarative rules specify the
domains and ranges of the relations and the definition of the classes of the YAGO
concept hierarchy. The rules can, for instance, be used to enforce that instances
 
Search WWH ::




Custom Search