Database Reference
In-Depth Information
3.4
Mining Data Streams with Uncertain Data
One of the interesting variations of the frequent itemset mining problem is the situ-
ation in which there is uncertainty about the presence of the data. We can model this
by assigning each item in a transaction an existential probability. For an item x and
transaction T j , there is a probability p ( x , T j ) that x in fact is present in the transaction.
Because the existence of the data is not certain, we can only compute an expected
support . The expected support for a single item x is the sum of its probabilities over
all transactions seem so far:
n
ˆ
s ( x )
=
p ( x , T j ) .
(9.8)
j =
1
For an itemset I
={
x 1 , x 2 , ... , x m }
,
multiply the probabilities within each
transaction:
n
ˆ
s ( I )
=
p ( x i , T j ) .
(9.9)
j
=
1
x i
I
There have been a small but important set of works investigating streaming uncertain
data. Leung and Hao [ 46 ] extend the UF-GROWTH algorithm for static uncertain data
to develop SUF-GROWTH, an exact mining algorithm. Like most exact algorithms,
SUF-GROWTH employs the sliding window model to limit the memory needs. As
each new transaction arrives, its probabilities are added to the pool, and those of
the oldest transaction in the window are removed. The same work also presents
an approximation algorithm UF-STREAMING which can produce quicker albeit less
accurate results. Algorithms for the damped window and landmark window have
also been developed [ 34 , 47 , 48 ].
A full discussion of algorithms and issues for mining uncertain data are provided in
Chap. 14. In particular, we refer the reader to Sect. 8 for a presentation of algorithms
for mining uncertain streaming data.
4
Mining Patterns Other than Itemsets
While the majority of research in frequent pattern mining in data streams has focused
on itemsets, valuable and interesting contributions have been made for the discovery
of other types of patterns. In this section, we discuss other types of patterns and
introduce methods for mining them. These other patterns fall into three groups:
subsequences, subtrees from trees, and subtrees and subgraphs from graphs. Selected
works from each of these three categories are listed in Table 9.4 .
Search WWH ::




Custom Search