Statistical Relational Data Integration for Information Extraction - Reasoning Web

Databases Reference

In-Depth Information

Statistical Relational Data Integration

for Information Extraction

Mathias Niepert

Department of Computer Science & Engineering

University of Washington, Seattle, WA, USA

mniepert@cs.washington.edu

Abstract. These lecture notes provide a brief overview of some state of

the art large scale information extraction projects. Consequently, these

projects are related to current research activities in the semantic web

community. The majority of the learning algorithms developed for these

information extraction projects are based on the lexical and syntactical

processing of Wikipedia and large web corpora. Due to the size of the

processed data and the resulting intractability of the associated inference

problems existing knowledge representation formalism are often inade-

quate for the task. We will present recent advances in combining tractable

logical and probabilistic models that bring statistical language process-

ing and rule-based approaches closer together. With these lecture notes

we hope to convince the attendees that there are numerous synergies

and research agendas that can arise when uncertainty-based data-driven

research meets rule-based schema-driven research. We also describe cer-

tain theoretical and practical advances in making probabilistic inference

scale to very large problems.

1 Introduction

Historically, semantic web research has focused on problems concerned with the

logical form of the schema, that is, the meta-level descriptions of classes and

roles that comprise the structure of the knowledge base. It comes at no surprise,

therefore, that the highly popular research areas of ontology learning, ontology

matching, and knowledge engineering have mostly concentrated on the termino-

logical structure, that is, the set of axioms involving class and role descriptions.

While meaningful progress has been made and the logical, computational, and

empirical understanding of these problems is deeper than ever before, this has

come at the cost of largely ignoring the data ,thatis, assertions of the aforemen-

tioned classes and roles. Instead of building knowledge representations around

existing data, more often than not, ontologies were designed and constructed in

a data vacuum. It is probably not far-fetched to assume that this is the main

reason for the skepticism (if not outright rejection) other research communities

have demonstrated towards the semantic web endeavor.

These lecture notes are based on several previous publications of the author and his

colleagues in conference proceedings such as AAAI, UAI, IJCAI, and ESWC.

Search WWH ::

Custom Search

Home