Database Reference
In-Depth Information
patterns that are supersets of p-FPs and deriving p-FPs in a top-down manner (i.e.,
descending cardinality of p-FPs).
To accelerate probabilistic frequent pattern mining, Wang et al. [ 51 ] applied
a model-based approach that supports both tuple uncertainty (as in the TODIS
algorithm [ 49 ]) and attribute uncertainty (as in the aforementioned dynamic compu-
tation technique [ 15 ]). Specifically, they represented the support pmf of a p-FP as
some existing probability models (e.g., Poisson binomial distribution model, normal
distribution). By doing so, Wang et al. quickly found two types of p-FPs: (i) threshold-
based p-FP (i.e., P ( sup ( X )
minsup )
minP rob ) and (ii) rank-based p-FP (e.g.,
top- k p-FP).
In addition to finding p-FPs from static datasets of uncertain data, there are also
algorithms that find p-FPs from dynamic streaming uncertain data. For instance,
Akbarinia and Masseglia [ 14 ] proposed an exact algorithm called FMU for fast
mining of streaming uncertain data with the sliding window model.
12
Conclusions
Frequent pattern mining is an important data mining task. It helps discover implicit,
previously unknown and potentially useful knowledge; it also helps reveal sets of
frequently co-occurring items in numerous real-life applications (e.g., bundles of
topics that are frequently bought together, collections of courses that are taken in
the same academic terms, events that are often co-located, groups of individuals that
have common interests). Here, it has drawn the attention of many researchers over
the past two decades. The research problem of frequent pattern mining was origi-
nally proposed to analyze shoppers' market basket transaction databases containing
precise data, in which the contents of transactions in the databases are known. Such
a research problem also plays an important role in other data mining tasks, such
as the mining of interesting or unexpected patterns, sequential mining, associative
classification, as well as outlier detection, in various real-life applications. As we
are living in an uncertain world, data in many real-life applications are uncertain.
Recently, researchers have paid more attention to the mining of frequent patterns
from probabilistic datasets of uncertain data.
In this chapter, we presented some recent works on mining frequent patterns
from probabilistic datasets of uncertain data. These include candidate generate-and-
test based, hyperlinked structure based, tree-based, as well as vertical uncertain
frequent pattern mining algorithms. Among them, the U-Apriori algorithm generates
candidate patterns and tests if their expected support meets or exceeds a user-specified
threshold. To avoid such a candidate generate-and-test approach, both UH-mine
and UF-growth algorithms use a pattern-growth mining approach. The UH-mine
algorithm keeps a UH-struct, from which frequent patterns are mined; the UF-growth
algorithm constructs a UF-tree, from which frequent patterns are mined. The UFP-
growth algorithm applies clustering to help reduce the number of nodes in a UFP-tree.
The PUF-growth and CUF-growth algorithms respectively construct a PUF-tree and
Search WWH ::




Custom Search