Database Reference
In-Depth Information
CHAPTER
2
Data and Query Model
Traditionally, database management systems are designed to deal with information that is completely
known. In reality, however, information is often incomplete or uncertain. An incomplete database is a
database that allows its instance to be in one of multiple states (worlds); a probabilistic database is an
incomplete database that, furthermore, assigns a probability distribution to the possible worlds. This
chapter introduces incomplete and probabilistic databases, and discusses some popular representation
methods. We start with a brief review of the relational data model and queries.
2.1 BACKGROUND OF THE RELATIONAL DATA MODEL
In this topic, we consider only relational data. A relational schema
R = R 1 ,...,R k
consists of k
relation names, and each relation R j has an associated arity r j
0.
A relational database instance , also called world W , consists of k relations W
= R 1 ,...,R k
,
where R j
is a finite relation of arity r j over some fixed, infinite universe U , R j
U r j . We often
blur the distinction between the relation name R j and the relation instance R j , and write R j for both.
When given n possible worlds W 1 ,...,W n , we abbreviate R j for R W j . We write interchangeably
either W or D for a database instance; in the latter case, the relation instances are denoted by
R 1 ,...,R k .
The queries that we will study are expressed in the Relational Calculus , or RC. Equivalently,
these are First Order Logic expressions over the vocabulary
R . A query has the form
{ x | Q }
, also
written as Q(
x are the free variables, also called head variables , in the relational formula
Q . The formula Q is given by the following grammar:
x) , where
¯
¯
Q
::=
u = v | R(x) |∃ x.Q 1 | Q 1 Q 2 | Q 1 Q 2 Q
(2.1)
Here u = v is an equality predicate (where u, v are variables or constants), R(x) is a relational
atom with variables and/or constants, whose relation symbol R is from the vocabulary
R . The
connectives
have standard interpretation. We will often blur
the distinction between the formula Q and the query
,
,
¬
and the existential quantifier
, and write simply Q( x) for the query.
A query Q without free variables is called a Boolean query . Given a database instance D ,we
write D |= Q whenever Q is true in D ; we omit the standard definition of D |= Q , which can be
found in any textbook on logic, model theory, or database theory, e.g., [ Abiteboul et al. , 1995 ]. If a
query Q has head variables
{ x | Q }
x , then it defines a function from database instances to relations of arity
| x |
: Q(D) ={ a | D |=
Q [ a/x ]}
. Here Q [ a/z ]
means the query expression Q where all variables
z
are substituted with the constants
a .
Search WWH ::




Custom Search