Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

The privacy metric that is used in this paper is rather different, in that they evaluate

the level of privacy by measuring the probability of an “unwanted” item to be included

in the perturbed transaction. The definition of “unwanted” here is that it is an item

that does not contribute to association rule mining in the sense that it does not appear

in any frequent itemset. An implication is that privacy estimates can be conditional

on the choices of ARM parameters ( sup min , con min ). This may encourage the miner

to experiment with a variety of values in order to maximize the breach of privacy.

Frameworks A common trend in the input data privacy literature was to propose

specific perturbation techniques, which are then analyzed for their privacy and ac-

curacy properties. Recently, in [ 4 ], the problem was approached from a different

perspective, wherein a generalized matrix-theoretic framework , called FRAPP , that

facilitates a systematic approach to the design of random perturbation schemes for

privacy-preserving mining was proposed. This framework supports amplification-

based privacy, and its execution and memory overheads are comparable to that of

classical mining on the true database. The distinguishing feature of FRAPP is its

quantitative characterization of the sources of error in the random data perturbation

and model reconstruction processes.

In fact, although it uses dependent attribute perturbation, it is fully decompos-

able into the perturbation of individual attributes, and hence has the same run-time

complexity as any independent perturbation method. Through the framework, many

of the earlier techniques are cast as special instances of the FRAPP perturbation

matrix. More importantly, it was shown that through appropriate choices of matrix

elements, new perturbation techniques can be constructed that provide highly ac-

curate mining results even under strict amplification-based [ 18 ] privacy guarantees.

In fact, a perturbation matrix with provably minimal condition number (in the class

of symmetric positive-definite matrices), was identified, substantially improving the

accuracy under the given constraints. Finally, an efficient integration of this optimal

matrix with the association mining process was outlined.

3

Output Privacy

In this section, we present an overview of a specific class of methods in the knowledge

hiding area, known as frequent itemset and association rule hiding (ARH), which are

applied to offer output privacy. Other classes of methods, under the same area, include

classification rule hiding [ 35 , 36 ] and sequential pattern hiding [ 1 , 20 ]. “Association

rule hiding” (a term used for brevity instead of the longer title “frequent itemset and

association rule hiding”) has been mentioned for the first time in 1999 in a workshop

paper by Atallah et al. [ 11 ]. The authors in [ 11 ] tried to apply general ideas regard-

ing the implications of data mining in security and privacy of information—first

presented by Clifton and Marks in [ 14 ]—to the association rule mining [ 5 ] frame-

work. Clifton and Marks, following the suggestions of D.E. O'Leary [ 38 ]—who

was the very first to point out the security and privacy breaches that originate from data

Search WWH ::

Custom Search

Home