Database Reference
In-Depth Information
The privacy metric that is used in this paper is rather different, in that they evaluate
the level of privacy by measuring the probability of an “unwanted” item to be included
in the perturbed transaction. The definition of “unwanted” here is that it is an item
that does not contribute to association rule mining in the sense that it does not appear
in any frequent itemset. An implication is that privacy estimates can be conditional
on the choices of ARM parameters ( sup min , con min ). This may encourage the miner
to experiment with a variety of values in order to maximize the breach of privacy.
Frameworks A common trend in the input data privacy literature was to propose
specific perturbation techniques, which are then analyzed for their privacy and ac-
curacy properties. Recently, in [ 4 ], the problem was approached from a different
perspective, wherein a generalized matrix-theoretic framework , called FRAPP , that
facilitates a systematic approach to the design of random perturbation schemes for
privacy-preserving mining was proposed. This framework supports amplification-
based privacy, and its execution and memory overheads are comparable to that of
classical mining on the true database. The distinguishing feature of FRAPP is its
quantitative characterization of the sources of error in the random data perturbation
and model reconstruction processes.
In fact, although it uses dependent attribute perturbation, it is fully decompos-
able into the perturbation of individual attributes, and hence has the same run-time
complexity as any independent perturbation method. Through the framework, many
of the earlier techniques are cast as special instances of the FRAPP perturbation
matrix. More importantly, it was shown that through appropriate choices of matrix
elements, new perturbation techniques can be constructed that provide highly ac-
curate mining results even under strict amplification-based [ 18 ] privacy guarantees.
In fact, a perturbation matrix with provably minimal condition number (in the class
of symmetric positive-definite matrices), was identified, substantially improving the
accuracy under the given constraints. Finally, an efficient integration of this optimal
matrix with the association mining process was outlined.
3
Output Privacy
In this section, we present an overview of a specific class of methods in the knowledge
hiding area, known as frequent itemset and association rule hiding (ARH), which are
applied to offer output privacy. Other classes of methods, under the same area, include
classification rule hiding [ 35 , 36 ] and sequential pattern hiding [ 1 , 20 ]. “Association
rule hiding” (a term used for brevity instead of the longer title “frequent itemset and
association rule hiding”) has been mentioned for the first time in 1999 in a workshop
paper by Atallah et al. [ 11 ]. The authors in [ 11 ] tried to apply general ideas regard-
ing the implications of data mining in security and privacy of information—first
presented by Clifton and Marks in [ 14 ]—to the association rule mining [ 5 ] frame-
work. Clifton and Marks, following the suggestions of D.E. O'Leary [ 38 ]—who
was the very first to point out the security and privacy breaches that originate from data
Search WWH ::




Custom Search