Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

CHAPTER

Data and Query Model

Traditionally, database management systems are designed to deal with information that is completely

known. In reality, however, information is often incomplete or uncertain. An incomplete database is a

database that allows its instance to be in one of multiple states (worlds); a probabilistic database is an

incomplete database that, furthermore, assigns a probability distribution to the possible worlds. This

chapter introduces incomplete and probabilistic databases, and discusses some popular representation

methods. We start with a brief review of the relational data model and queries.

2.1 BACKGROUND OF THE RELATIONAL DATA MODEL

In this topic, we consider only relational data. A relational schema

R = R 1 ,...,R k

consists of k

relation names, and each relation R j has an associated arity r j

A relational database instance , also called world W , consists of k relations W

≥

= R 1 ,...,R k

where R j

is a finite relation of arity r j over some fixed, infinite universe U , R j

U r j . We often

blur the distinction between the relation name R j and the relation instance R j , and write R j for both.

When given n possible worlds W 1 ,...,W n , we abbreviate R j for R W j . We write interchangeably

either W or D for a database instance; in the latter case, the relation instances are denoted by

R 1 ,...,R k .

The queries that we will study are expressed in the Relational Calculus , or RC. Equivalently,

these are First Order Logic expressions over the vocabulary

⊆

R . A query has the form

{ x | Q }

, also

written as Q(

x are the free variables, also called head variables , in the relational formula

Q . The formula Q is given by the following grammar:

x) , where

::=

u = v | R(x) |∃ x.Q 1 | Q 1 ∧ Q 2 | Q 1 ∨ Q 2 |¬ Q

(2.1)

Here u = v is an equality predicate (where u, v are variables or constants), R(x) is a relational

atom with variables and/or constants, whose relation symbol R is from the vocabulary

R . The

connectives

have standard interpretation. We will often blur

the distinction between the formula Q and the query

∧

∨

and the existential quantifier

∃

, and write simply Q( x) for the query.

A query Q without free variables is called a Boolean query . Given a database instance D ,we

write D |= Q whenever Q is true in D ; we omit the standard definition of D |= Q , which can be

found in any textbook on logic, model theory, or database theory, e.g., [ Abiteboul et al. , 1995 ]. If a

query Q has head variables

{ x | Q }

x , then it defines a function from database instances to relations of arity

| x |

: Q(D) ={ a | D |=

Q [ a/x ]}

. Here Q [ a/z ]

means the query expression Q where all variables

are substituted with the constants

a .

Probabilistic Databases

Search WWH ::

Custom Search

Home