Database Reference
In-Depth Information
proposed in the literature for uncertain frequent pattern mining [ 15 ], and a compu-
tational evaluation of the different techniques is provided in [ 64 ]. Many algorithms
such as FP-growth are harder to generalize to uncertain data [ 15 ] because of the dif-
ficulty in storing probability information with the FP-Tree. Nevertheless, as the work
in [ 15 ] shows, other related methods such as H-mine [ 59 ] can be generalized easily to
the case of uncertain data. Uncertain frequent pattern mining methods have also been
extended to the case of graph data [ 76 ]. A variant of uncertain graph pattern mining
discovers highly reliable subgraphs [ 40 ]. Highly reliable subgraphs are subgraphs
that are hard to disconnect in spite of the uncertainty associated with the edges. A
discussion of the different methods for frequent pattern mining with uncertain data
is provided in Chap. 14.
5
Privacy Issues
Privacy has increasingly become a topic of concern in recent years because of the wide
availability of personal data about individuals [ 7 ]. This has often led to reluctance to
share data, share it in a constrained way, or share downgraded versions of the data.
The additional constraints and downgrading translate to challenges in discovering
frequent patterns. In the context of frequent pattern and association rule mining, the
primary challenges are as follows:
1. When privacy-preservation methods such as randomization are used, it becomes
a challenge to discover associations from the underlying data. This is because a
significant amount of noise has been added to the data, and it is often difficult to
discover the association rules in the presence of this noise. Therefore, one class
of association rule mining methods [ 30 ] proposes effective methods to perturb
the data, so that meaningful patterns may be discovered while retaining privacy
of the perturbed data.
2. In some cases, the output of a privacy-preserving data mining algorithm can lead
to violation of privacy. This is because association rules can reveal sensitive in-
formation about individuals when they relate sensitive attributes to other kinds of
attributes. Therefore, one class of methods focusses on the problem of association
rule hiding [ 65 ].
3. In many cases, the data to be mined is stored in a distributed way by competitors
who may wish to determine global insights without, at the same time, revealing
their local insights. This problem is referred to as that of distributed privacy
preservation [ 25 ]. The data may be either horizontally partitioned across rows
(different records) or vertically partitioned (across attributes). Each of these forms
of partitioning require different methods for distributed mining.
Methods for privacy-preserving association rule mining are addressed in Chap. 15.
Search WWH ::




Custom Search