Databases Reference
In-Depth Information
Statistical Relational Data Integration
for Information Extraction
Mathias Niepert
Department of Computer Science & Engineering
University of Washington, Seattle, WA, USA
mniepert@cs.washington.edu
Abstract. These lecture notes provide a brief overview of some state of
the art large scale information extraction projects. Consequently, these
projects are related to current research activities in the semantic web
community. The majority of the learning algorithms developed for these
information extraction projects are based on the lexical and syntactical
processing of Wikipedia and large web corpora. Due to the size of the
processed data and the resulting intractability of the associated inference
problems existing knowledge representation formalism are often inade-
quate for the task. We will present recent advances in combining tractable
logical and probabilistic models that bring statistical language process-
ing and rule-based approaches closer together. With these lecture notes
we hope to convince the attendees that there are numerous synergies
and research agendas that can arise when uncertainty-based data-driven
research meets rule-based schema-driven research. We also describe cer-
tain theoretical and practical advances in making probabilistic inference
scale to very large problems.
1 Introduction
Historically, semantic web research has focused on problems concerned with the
logical form of the schema, that is, the meta-level descriptions of classes and
roles that comprise the structure of the knowledge base. It comes at no surprise,
therefore, that the highly popular research areas of ontology learning, ontology
matching, and knowledge engineering have mostly concentrated on the termino-
logical structure, that is, the set of axioms involving class and role descriptions.
While meaningful progress has been made and the logical, computational, and
empirical understanding of these problems is deeper than ever before, this has
come at the cost of largely ignoring the data ,thatis, assertions of the aforemen-
tioned classes and roles. Instead of building knowledge representations around
existing data, more often than not, ontologies were designed and constructed in
a data vacuum. It is probably not far-fetched to assume that this is the main
reason for the skepticism (if not outright rejection) other research communities
have demonstrated towards the semantic web endeavor.
These lecture notes are based on several previous publications of the author and his
colleagues in conference proceedings such as AAAI, UAI, IJCAI, and ESWC.
 
Search WWH ::




Custom Search