Database Reference
In-Depth Information
errors—such as (i) inherited measurement inaccuracies, (ii) sampling frequency of
the sensors, (iii) deviation caused by a rapid change (e.g., drift, noise) of the measured
property over time, (iv) wireless transmission errors, or (v) network latencies—also
introduce uncertainty into the data reported by these sensors. Moreover, there is also
uncertainty in survey data (e.g., number “0” vs. symbol “
” vs. letter “O” or “o”;
similarly, number “1” vs. upper case letter “I” vs. lower case letter “l”) and uncer-
tainty due to data granularity (e.g., city, province) in taxonomy. Disguised missing
data, which are not explicitly represented as such but instead appear as potentially
valid data values, also introduce uncertainty. Furthermore, in privacy-preserving
applications [ 10 ], sensitive data may be intentionally blurred via aggregation or per-
turbation so as to preserve data anonymity. All these sources of uncertainty lead to
huge amounts of uncertain data in real-life applications [ 48 , 52 ].
Many key models and algorithms have been developed over the past few years for
various uncertain data mining tasks [ 3 , 11 ]. These include (i) clustering uncertain
data [ 2 , 7 , 25 ], (ii) classifying uncertain data [ 47 , 53 ] and (iii) detecting outliers
from uncertain data [ 5 , 9 ]. In this chapter, we examine another data mining task—
namely, uncertain frequent pattern mining . To mine frequent patterns from uncertain
data, different methodologies (e.g., fuzzy set theory, rough set theory) can be appli-
cable. Among them, probability theory is more popular and widely used by many
researchers.
In this chapter, we focus on uncertain frequent pattern mining in a probabilistic set-
ting. The remainder of this chapter is organized as follows. The next section describes
a key model for uncertain frequent pattern mining: the probabilistic model. Then, we
present those key uncertain frequent pattern mining algorithms based on (i) the can-
didate generate-and-test paradigm, (ii) the frequent pattern growth paradigm with
hyperlinked structures, and (iii) the frequent pattern growth para-digm with tree
structures in Sects. 3, 4 and 5, respectively. Sections 6, 7 and 8 describe key algo-
rithms for uncertain frequent pattern mining (i) with constraints, (ii) from Big data,
and (iii) from data streams, respectively. Section 9 examines key algorithms for min-
ing uncertain data that are in vertical representation. We briefly discuss and compare
these algorithms in Sect. 10. While Sects. 2 to 10 focus on (expected support based)
frequent patterns, Sect. 11 focuses on the probabilistic frequent patterns. Finally,
Sect. 12 gives conclusions.
2
The Probabilistic Model for Mining Expected Support-Based
Frequent Patterns from Uncertain Data
As a building block for association rule mining [ 12 ] (which helps reveal associative
relationships embedded in data), frequent pattern mining aims to discover frequently
occurring sets of items, objects or events (e.g., frequently purchased merchandise
items in shopper market baskets, bundles of popular topics, popular courses taken
by students, events that are frequently collocated). In the early days, most frequent
Search WWH ::




Custom Search