Databases Reference
In-Depth Information
Q
Keyword
Reformulation
Mediated Schema
Q
1
,...Q
m
Query
Reformulation
Q
11
,...Q
1n
,…,Q
k1
,...Q
kn
Query
Pocessor
Q
k1
,...Q
kn
Q
11
,...Q
1n
...
D
k
D
1
D
4
D
2
D
3
Fig. 4.1
Architecture of a data-integration system that handles uncertainty
query on the mediated schema or a keyword query, the system returns a set of answer
tuples, each with a probability. If Q is a keyword query, the system first performs
keyword reformulation to translate it into a set of candidate structured queries on
the mediated schema. Otherwise, the candidate query is Q itself.
2.3
Source of Probabilities
A critical issue in any system that manages uncertainty is whether we have a reliable
source of probabilities. Whereas obtaining reliable probabilities for such a system is
one of the most interesting areas for future research, there is quite a bit to build
on. For keyword reformulation, it is possible to train and test reformulators on
large numbers of queries such that each reformulation result is given a probability
based on its performance statistics. For information extraction, current techniques
are often based on statistical machine learning methods and can be extended to com-
pute probabilities of each extraction result. Finally, in the case of schema matching,
it is standard practice for schema matchers to also associate numbers with the can-
didates they propose (e.g.,
Berlin and Motro 2002
;
Dhamankar et al. 2004
;
Do and
Rahm
2002
;
Doan et al. 2002
;
He and Chang 2003
;
Kang and Naughton 2003
;
Rahm
and Bernstein
2001
;
Wang et al. 2004
). The issue here is that the numbers are meant
only as a ranking mechanism rather than true probabilities. However, as schema
matching techniques start looking at a larger number of schemas, one can imagine
ascribing probabilities (or estimations thereof) to their measures.