Database Reference
In-Depth Information
Table 14.7
Streaming
uncertain data
D
3
Batch ID
Set of items with existential probability
B
1
{
t
1
={
f
:0.7},
t
2
={
f
:0.7,
g
:0.9,
h
:0.7,
i
:0.5},
t
3
={
f
:0.7,
g
:0.9,
h
:0.8}}
B
2
{
t
4
={
f
:0.7,
g
:0.9},
t
5
={
g
:0.6},
t
6
={
g
:0.6}}
B
3
{
t
7
={
f
:0.7,
g
:0.9,
h
:0.7,
i
:0.5},
t
8
={
f
:0.7,
g
:0.9,
h
:0.8},
t
9
={
f
:0.7,
g
:0.9,
i
:0.5}}
and
reduce:
{
x
}
,
{
x
}
-proj. DB
→
list of
frequent
k
-itemset
X
,
expSup
(
X
,
D
)
.
(14.9)
To recap, by using the above two sets of “map” and “reduce” functions, the MR-
growth (i) first finds all frequent 1-itemsets with their expected support and (ii) then
buids appropriate local UF-trees (for projected databases) to find all frequent
k
-
itemsets (for
k
≥
2) with their expected support.
8
Streaming Uncertain Frequent Pattern Mining
In addition to
static
probabilistic datasets of uncertain data,
dynamic streams
of
uncertain data can also be generated (e.g., by wireless sensors) in many real-life
applications (e.g., environment surveillance). This leads to stream mining [
22
].
8.1
SUF-growth
To mine frequent patterns from streams of uncertain data, Leung and Hao [
36
]
extended the UF-growth algorithm (ref. Sect. 5.1) to produce an exact mining al-
gorithm called
SUF-growth
. During the mining process, the (i) sliding window
model, (ii) time-fading model, or (iii) landmark model is commonly used in pro-
cessing batches of transactions in the data streams. When using a sliding window
model, SUF-growth captures only the contents of streaming data in batches of trans-
actions belonging to the current window (that captures the recent
w
batches) in a tree
structure called
SUF-tree
. When the window slides, SUF-growth removes from the
SUF-tree those data belonging to older batches and adds to the SUF-tree those data
belonging to newer batches. Hence, each tree node in the SUF-tree consists of three
components: (i) an item, (ii) its existential probability, and (iii) a list of its
w
occur-
rence counts in the path. By doing so, when the window slides, the oldest occurrence
counts (representing the oldest streaming data) are replaced by the newest occurrence