Database Reference
In-Depth Information
source-level uncertainty (SLU) and event-level uncertainty (ELU) .InSLU,the
“source” attribute of each tuple is uncertain: each tuple contains a probability
distribution over possible sources ( attribute-level uncertainty [15]). As noted in
[11], this formulation applies to scenarios where it is known that some customer
made a specific retail transaction, but the identity of the customer who made
that transaction is uncertain. This could happen because of incomplete cus-
tomer details, or because the customer database itself is probabilistic as a result
of “deduplication” or cleaning [7]. In ELU, the source of the tuple is certain,
but the events are uncertain. This applies to several scenarios involving sensors
in fixed locations detecting events using noisy techniques. In this case, either
the existence of an event is uncertain (for example tracking endangered animals
using sensors [8] or aggregating unreliable observations into events [9]) or the
event's existence is essentially certain, but its content is uncertain (for example,
reading a numberplate using automated numberplate recognition or sequences
of search terms that may have ambiguous meanings [11]).
Our contributions. In [12], ecient algorithms were proposed for the SPM prob-
lem in SLU probabilistic databases, under the expected support measure. These
algorithms were based on the candidate generate-and-test approach, which is
known to be relatively inecient for classical SPM. In [12], it was noted that it
is not straightforward to adapt the pattern-growth approach [13], which is usually
the best for classical SPM, to the probabilistic case. In this paper, we overcome
the obstacles mentioned in [12], and propose a pattern-growth method for the
SPM problem in probabilistic databases. In classical SPM, pattern-growth works
by performing L 1 computation on a projected database. The key contributions of
this work are to formulate the analogue of a projected database in probabilistic
settings, and to identify the appropriate L 1 computation to perform on the pro-
jected database. Unlike the deterministic case, it appears that pattern-growth for
probabilistic SPM appears to require additional memory. We have implemented
the new pattern-growth algorithm and present an experimental evaluation of
the pattern-growth algorithm against the ones presented in [12]; this evaluation
shows that although pattern-growth is superior to candidate generation in the
deterministic settings, the picture is not as clear as for the probabilistic case.
2 Problem Statement
Classical SPM [14,2]. Let
I
=
{i 1 ,i 2 ,...,i q }
be a set of items and
S
=
{
1 ,...,m}
be a set of sources .An event e ⊆I
is a collection of items. A database
D =
is an ordered list of records such that each r i ∈ D is of the
form ( eid i ,e i i ), where eid i is a unique event-id, including a time-stamp (events
are ordered by this time-stamp), e i is an event and σ i is a source.
A sequence s =
r 1 ,r 2 ,...,r n
is an ordered list of events. The events s i in the
sequence are called its elements .The length of a sequence s is the total number of
s 1 ,s 2 ,...,s a
items in it, i.e. j =1 |s j |
; for any integer k ,a k -sequence is a sequence of length k .
Let s =
be two sequences. We say that s is a
subsequence of t , denoted s t , if there exist integers 1
s 1 ,s 2 ,...,s q
and t =
t 1 ,t 2 ,...,t r
≤ i 1 <i 2 < ···<i q ≤ r
Search WWH ::




Custom Search