Database Reference
In-Depth Information
Computing the exact answers to top- k typicality queries is quadratic in nature.
We develop three efficient approximation algorithms, a randomized tournament
method, a local typicality approximation algorithm that approximates the typi-
cality score of an instance o using a small subset of instances close to o , and a
hybrid method that combines the tournament mechanism and the local typicality
approximation.
We develop probabilistic ranking queries and reverse probabilistic ranking
queries on uncertain data by considering the two dimensions: the ranks of un-
certain instances or objects and their probabilities of achieving certain ranks.
-
Given a rank parameter k , we can rank instances or objects according to their
top- k probabilities (that is, the probabilities of being ranked top- k ). A prob-
abilistic ranking query finds the uncertain instances or objects whose top- k
probabilities are the largest.
-
Given a probability threshold p , the p -rank of an instance o is the minimum k
such that Pr k
(
o
)
p . Similarly, the p -rank of an object O is the minimum k
such that Pr k
p .A reverse probabilistic ranking query finds the uncer-
tain instances or objects whose p -ranks are the smallest.
(
O
)
Probabilistic ranking queries and reverse probabilistic ranking queries provide
complement views of the ranking characterizes of uncertain data. Moreover, we
develop three efficient query evaluation methods, an exact algorithm, a sampling
method, and a Poisson approximation based method. Last, we devise an effec-
tive and compact index for probabilistic ranking queries and reverse probabilistic
ranking queries, which helps efficient online query answering.
We study the problem of continuous probabilistic top- k query (continuous PT- k
query for short) on uncertain streams, which reports, for each time instant t , the
set of uncertain streams satisfying the query in the sliding window. This problem
extends the ranking queries on static data to dynamic data. To tackle this prob-
lem, we develop an exact algorithm that involves several important stream spe-
cific techniques and a sampling method. Moreover, we devise the space efficient
versions of the exact algorithm and the sampling method that utilize approximate
quantiles to maintain the summary of the data streams.
We study ranking queries on the probabilistic linkage model, which is an exten-
sion of the basic uncertain object model by considering the dependencies among
objects. Given two sets of tuples A and B , a linkage matches a tuple in A to a tuple
in B , meaning that the two tuples refer to the same real world entity. A probability
is associated with each linkage, representing the confidence of the match. Each
tuple can be associated with multiple linkages, but only one linkage can be true
due to the constraint that one tuple can only match at most one tuple in another
data set in the ground truth.
We can consider each tuple t A
A as an uncertain object. A tuple t B
B can be
considered as an instance of t A if there is a probabilistic linkage l
=(
t A ,
t B ) ∈L
such that Pr
(
l
) >
0. The membership probability of instance t B with respect to
object t A is Pr
. Different from the basic uncertain object model where each
instance only belongs to one object, in the probabilistic linkage model, a tuple
(
l
)
Search WWH ::




Custom Search