Databases Reference
In-Depth Information
SELECT name, phone, address
FROM People
The answer generated by our system with respect to M and pM is shown in Fig. 4.5 c.
(As we describe in detail in the following sections, we allow users to compose
queries using any attribute in the source.) Compared with using one of M 2 to M 5
as a mediated schema, our method generates better query results in that (1) it treats
answers with home address and home phone and answers with office address and
office phone equally, and (2) it favors answers with the correct correlation between
address and phone number.
t
4.2
Probabilistic Mediated Schema
Consider a set of source schemas
. We denote the attributes in schema
S i ;i 2 Œ1;n; by attr.S i /, and the set of all source attributes as
f S 1 ;:::;S n g
A D
attr.S 1 / [ [ attr.S n /. We denote a mediated schema for the set of sources
f S 1 ;:::;S n g
A
,thatis,
, where each of the A i 's is called a mediated
attribute . The mediated attributes are sets of attributes from the sources, i.e., A i
A
by M Df A 1 ;:::;A m g
.
Note that whereas in a traditional mediated schema an attribute has a name, we
do not deal with naming of an attribute in our mediated schema and allow users
to use any source attribute in their queries. (In practice, we can use the most fre-
quent source attribute to represent a mediated attribute when exposing the mediated
schema to users.) If a query contains an attribute a 2 A i ;i 2 Œ1;m, then when
answering the query, we replace a everywhere with A i .
A probabilistic mediated schema consists of a set of mediated schemas, each
with a probability indicating the likelihood that the schema correctly describes
the domain of the sources. We formally define probabilistic mediated schemas as
follows.
; for each i;j 2 Œ1;m;i ¤ j ) A i \ A j D;
Definition 11 (Probabilistic Mediated Schema). Let
f S 1 ;:::;S n g
be a set of
schemas. A probabilistic mediated schema (p-med-schema) for
f S 1 ;:::;S n g
is a
set
M
Df .M 1 ;Pr.M 1 //;:::;.M l ;Pr.M l // g ;
where
For each i 2 Œ1;l, M i
is a mediated schema for S 1 ;:::;S n , and for each
i;j 2 Œ1;l;i ¤ j, M i
and M j
correspond to different clusterings of the source
attributes;
Pr.M i / 2 .0;1,and˙ i D 1 Pr.M i / D 1.
t
Semantics of queries: Next, we define the semantics of query answering with respect
to a p-med-schema and a set of p-mappings for each mediated schema in the
p-med-schema. Answering queries with respect to p-mappings returns a set of
Search WWH ::




Custom Search