An Introduction to Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

proposed in the literature for uncertain frequent pattern mining [ 15 ], and a compu-

tational evaluation of the different techniques is provided in [ 64 ]. Many algorithms

such as FP-growth are harder to generalize to uncertain data [ 15 ] because of the dif-

ficulty in storing probability information with the FP-Tree. Nevertheless, as the work

in [ 15 ] shows, other related methods such as H-mine [ 59 ] can be generalized easily to

the case of uncertain data. Uncertain frequent pattern mining methods have also been

extended to the case of graph data [ 76 ]. A variant of uncertain graph pattern mining

discovers highly reliable subgraphs [ 40 ]. Highly reliable subgraphs are subgraphs

that are hard to disconnect in spite of the uncertainty associated with the edges. A

discussion of the different methods for frequent pattern mining with uncertain data

is provided in Chap. 14.

5

Privacy Issues

Privacy has increasingly become a topic of concern in recent years because of the wide

availability of personal data about individuals [ 7 ]. This has often led to reluctance to

share data, share it in a constrained way, or share downgraded versions of the data.

The additional constraints and downgrading translate to challenges in discovering

frequent patterns. In the context of frequent pattern and association rule mining, the

primary challenges are as follows:

1. When privacy-preservation methods such as randomization are used, it becomes

a challenge to discover associations from the underlying data. This is because a

significant amount of noise has been added to the data, and it is often difficult to

discover the association rules in the presence of this noise. Therefore, one class

of association rule mining methods [ 30 ] proposes effective methods to perturb

the data, so that meaningful patterns may be discovered while retaining privacy

of the perturbed data.

2. In some cases, the output of a privacy-preserving data mining algorithm can lead

to violation of privacy. This is because association rules can reveal sensitive in-

formation about individuals when they relate sensitive attributes to other kinds of

attributes. Therefore, one class of methods focusses on the problem of association

rule hiding [ 65 ].

3. In many cases, the data to be mined is stored in a distributed way by competitors

who may wish to determine global insights without, at the same time, revealing

their local insights. This problem is referred to as that of distributed privacy

preservation [ 25 ]. The data may be either horizontally partitioned across rows

(different records) or vertically partitioned (across attributes). Each of these forms

of partitioning require different methods for distributed mining.

Methods for privacy-preserving association rule mining are addressed in Chap. 15.

Search WWH ::

Custom Search

Home