Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

errors—such as (i) inherited measurement inaccuracies, (ii) sampling frequency of

the sensors, (iii) deviation caused by a rapid change (e.g., drift, noise) of the measured

property over time, (iv) wireless transmission errors, or (v) network latencies—also

introduce uncertainty into the data reported by these sensors. Moreover, there is also

uncertainty in survey data (e.g., number “0” vs. symbol “

” vs. letter “O” or “o”;

similarly, number “1” vs. upper case letter “I” vs. lower case letter “l”) and uncer-

tainty due to data granularity (e.g., city, province) in taxonomy. Disguised missing

data, which are not explicitly represented as such but instead appear as potentially

valid data values, also introduce uncertainty. Furthermore, in privacy-preserving

applications [ 10 ], sensitive data may be intentionally blurred via aggregation or per-

turbation so as to preserve data anonymity. All these sources of uncertainty lead to

huge amounts of uncertain data in real-life applications [ 48 , 52 ].

Many key models and algorithms have been developed over the past few years for

various uncertain data mining tasks [ 3 , 11 ]. These include (i) clustering uncertain

data [ 2 , 7 , 25 ], (ii) classifying uncertain data [ 47 , 53 ] and (iii) detecting outliers

from uncertain data [ 5 , 9 ]. In this chapter, we examine another data mining task—

namely, uncertain frequent pattern mining . To mine frequent patterns from uncertain

data, different methodologies (e.g., fuzzy set theory, rough set theory) can be appli-

cable. Among them, probability theory is more popular and widely used by many

researchers.

In this chapter, we focus on uncertain frequent pattern mining in a probabilistic set-

ting. The remainder of this chapter is organized as follows. The next section describes

a key model for uncertain frequent pattern mining: the probabilistic model. Then, we

present those key uncertain frequent pattern mining algorithms based on (i) the can-

didate generate-and-test paradigm, (ii) the frequent pattern growth paradigm with

hyperlinked structures, and (iii) the frequent pattern growth para-digm with tree

structures in Sects. 3, 4 and 5, respectively. Sections 6, 7 and 8 describe key algo-

rithms for uncertain frequent pattern mining (i) with constraints, (ii) from Big data,

and (iii) from data streams, respectively. Section 9 examines key algorithms for min-

ing uncertain data that are in vertical representation. We briefly discuss and compare

these algorithms in Sect. 10. While Sects. 2 to 10 focus on (expected support based)

frequent patterns, Sect. 11 focuses on the probabilistic frequent patterns. Finally,

Sect. 12 gives conclusions.

◦

2

The Probabilistic Model for Mining Expected Support-Based

Frequent Patterns from Uncertain Data

As a building block for association rule mining [ 12 ] (which helps reveal associative

relationships embedded in data), frequent pattern mining aims to discover frequently

occurring sets of items, objects or events (e.g., frequently purchased merchandise

items in shopper market baskets, bundles of popular topics, popular courses taken

by students, events that are frequently collocated). In the early days, most frequent

Frequent Pattern Mining

Search WWH ::

Custom Search

Home