Databases Reference
In-Depth Information
This skepticism manifests itself in a recent surge of information extraction
projects such as the open information extraction [23] (OIE) and the never ending
language learning [13] (NELL) projects. Indeed, the OIE project explicitly defines
itself as open , meaning that it does not leverage ontologies or relational schemas.
The major argument supporting this position is that a relational schema or on-
tology unnecessarily constrains what can be extracted from large web corpora.
The NELL project leverages a type system and a fixed set of relations, even
though recent work has moved towards (semi-)automatically extending the set
of relations. However, insight and expertise accumulated in the semantic web
community over the last 10 years is largely ignored. For instance, the project
does not employ canonical labels for its entities ('Argentina' refers to both, the
national soccer team and the country itself) and makes no use of existing knowl-
edge representation formalisms even though it actually uses notions such as
range and domain restrictions implicitly. While this could be explained with the
specific applications the creators have in mind (improved keyword search and
natural language question answering, for instance) there are some reasonable
arguments in favor of not completely ignoring the existing body of work and ex-
perience of the semantic web community. Other information extraction projects
such as DBpedia [4,59] and YAGO [87,36] are more in line with semantic web
technologies as they use unique canonical identifiers for entities (derived from
the URIs of the corresponding Wikipedia articles) and notions such as range
and domain restrictions that closely resemble the RDF standard. The advantage
of using these standardized RDF formalisms is that they enable the creation
of links across heterogeneous data sets and a unifying syntactic and semantic
framework for knowledge bases. DBpedia, for instance, has established itself as a
linking hub for the linked open data cloud. The existence of a relational schema
or ontology also facilitates relational query processing and the use of statistical
relational approaches such as Markov logic [80].
The present lecture notes provide a brief overview of existing information
extraction projects ranging from those with a predetermined ontology, that is,
a relational schema, high precision extractions, and limited coverage, to those
without any kind of schema, low precision extractions, and broader coverage.
We do not take sides and instead focus on possible synergies that arise when we
consider each of the projects as disparate and heterogeneous knowledge bases
whose integration would not only broaden the amount of extracted knowledge
but also increase the extraction quality and provide relational schemas for facts
that were previously schema-less. We provide an overview of the problem areas
ontology matching and object reconciliation from a semantic web perspective.
We then show how both the relational schema and the data can be jointly
modeled with statistical relational formalisms.
Ontology matching, or ontology alignment, is the problem of determining
correspondences between concepts, properties, and individuals of two or more
different formal ontologies [26]. The alignment of ontologies allows semantic ap-
plications to exchange and enrich the data expressed in the respective ontolo-
gies. An important results of the yearly ontology alignment evaluation initiative
 
Search WWH ::




Custom Search