Databases Reference
In-Depth Information
Q
Keyword
Reformulation
Mediated Schema
Q 1 ,...Q m
Query
Reformulation
Q 11 ,...Q 1n ,…,Q k1 ,...Q kn
Query
Pocessor
Q k1 ,...Q kn
Q 11 ,...Q 1n
...
D k
D 1
D 4
D 2
D 3
Fig. 4.1
Architecture of a data-integration system that handles uncertainty
query on the mediated schema or a keyword query, the system returns a set of answer
tuples, each with a probability. If Q is a keyword query, the system first performs
keyword reformulation to translate it into a set of candidate structured queries on
the mediated schema. Otherwise, the candidate query is Q itself.
2.3
Source of Probabilities
A critical issue in any system that manages uncertainty is whether we have a reliable
source of probabilities. Whereas obtaining reliable probabilities for such a system is
one of the most interesting areas for future research, there is quite a bit to build
on. For keyword reformulation, it is possible to train and test reformulators on
large numbers of queries such that each reformulation result is given a probability
based on its performance statistics. For information extraction, current techniques
are often based on statistical machine learning methods and can be extended to com-
pute probabilities of each extraction result. Finally, in the case of schema matching,
it is standard practice for schema matchers to also associate numbers with the can-
didates they propose (e.g., Berlin and Motro 2002 ; Dhamankar et al. 2004 ; Do and
Rahm 2002 ; Doan et al. 2002 ; He and Chang 2003 ; Kang and Naughton 2003 ; Rahm
and Bernstein 2001 ; Wang et al. 2004 ). The issue here is that the numbers are meant
only as a ranking mechanism rather than true probabilities. However, as schema
matching techniques start looking at a larger number of schemas, one can imagine
ascribing probabilities (or estimations thereof) to their measures.
 
Search WWH ::




Custom Search