XML-to-XML queries - Foundations of Data Exchange

Database Reference

In-Depth Information

language of ground conjunctive queries : that is, conjunctions of atoms R ( a ),where R is a

relation, and a is a tuple of constants. Then Q (

) represents the maximum knowledge

we can extract from Q (

) with respect to this language. This is the familiar notion of

certain answers.

However, if we use the more expressive language of conjunctive queries , we can extract

additional knowledge from Q (

). For instance, suppose Q (

) consists of two instances:

{

R ( a , a ) , R ( b , c )

}

and

{

R ( a , a ) , R ( d , c )

}

Then the conjunctive query R ( a , a )

) than the inter-

section of the instances above, which is just R ( a , a ). This certain knowledge about Q (

∧∃

xR ( x , c ) tells us more about Q (

)

would be traditionally presented as a naıve table: that is, a table with variables (nulls). The

naıve table corresponding to R ( a , a )

∧∃

xR ( x , c ) is simply

{

R ( a , a ) , R ( x , c )

}

. Notice that the

class of databases that satisfy R ( a , a )

): every valuation of x

in { R ( a , a ) , R ( x , c ) } gives such a database. Nevertheless, conjunctive queries are not able

to extract any more knowledge about Q (

∧∃

xR ( x , c ) is not equal to Q (

), and thus narrow down the set of described

databases. The formula R ( a , a )

∧∃

xR ( x , c ) is as close to a definition of Q (

) as possible

within the language of conjunctive queries. Let us formalize these intuitions.

We define the notion of a max-description of a set

of databases that, in a given lan-

guage, expresses the information we can infer with certainty from

. Certain answers to

queries are then a special case of max-descriptions, applied to sets

{

Q ( D )

}

as D ranges

over a collection of databases.

To explain this notion, assume that

is a logical formalism in which we express prop-

erties of databases from

(e.g., conjunctive queries, or ground conjunctive queries such

as R ( a , a )). A set

of formulae of

defines a set of databases, called models of

and

denoted Mod(

), consisting of all databases satisfying each formula in

Mod(

{

for every

ϕ ∈ Φ }

To describe

fully in

we would need its

-definition: a finite set

-formulae

such that

). This is not always achievable, so instead we settle for the next

best thing, which is an

= Mod(

-definition of the set of models of certain knowledge about

expressed in

This certain

-knowledge of the class

, called

-theory of

, is the set of all formu-

lae from

satisfied in all databases from

Th L (

{ ϕ ∈L |

for every D

∈D}

The most precise (finite) description of

we can express in

is a finite definition of

Mod(Th L (

)), i.e., a finite set

of formulae of

such that

Mod(

)=Mod(Th L (

)) .

of XML trees. Since our setting always

includes a schema, we will be making the following assumptions about such sets

Let us now apply this general definition to sets

Foundations of Data Exchange

Search WWH ::

Custom Search

Home