Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Input Privacy The first category is related to the data per se and is known as data

hiding or input privacy . Specifically, data hiding tries to obfuscate the disclosed

data in order to prevent the miner from reliably extracting confidential or private

information. Input privacy methods aim at addressing environments where users

are unwilling to provide their personal information, or deliberately provide false

information, to data recipients, because they fear that their privacy may be violated.

The goal of these methods is to guarantee that such personal information can be

released to (potentially untrusted) data recipients in a privacy-preserving way that still

allows the data recipients to build accurate data mining models from the released data.

Several methods have been proposed to provide input privacy (e.g., [ 7 , 6 , 17 , 45 ])

by employing various data transformation strategies.

Output Privacy The second category concerns the information, or the knowledge,

that a data mining method may discover after having analyzed the data, and is known

as knowledge hiding or output privacy . Specifically, it is concerned with the sanitiza-

tion of confidential knowledge patterns derived from the data. Output privacy meth-

ods aim to eliminate the disclosure of sensitive patterns from datasets. If the datasets

were shared as-is, then such patterns could easily lead to (a) the disclosure of sensi-

tive information, such as business or trade secrets that provide competitive advantage

to business competitors, or (b) discrimination, if they involve individuals in the input

data who have certain unique characteristics. Several methods have been proposed

to offer output privacy (e.g., [ 11 , 15 , 21 , 39 ]) by eliminating sensitive patterns from

the released data, in a way that minimizes data distortion and side-effects.

Owner Privacy Finally, a third line of research involves protocols that enable a

group of data owners to collectively mine their data, in a distributed fashion, with-

out allowing any party to reliably learn the data (or sensitive information about

the data) that the other owners hold—that is, the sources of the data. For this pur-

pose, several cryptographic methods have been recently proposed to facilitate the

privacy-preserving distributed mining of data that reside in different data warehouses

(e.g., [ 26 , 51 , 52 ]). These methods assume that the data are either horizontally or

vertically partitioned among the different sites, and that any sensitive disclosures

should be limited in the data mining process.

The rest of this chapter is organized as follows: Sect. 2 elaborates on input

privacy methods that enable the safe discovery of association rules from large his-

torical databases. Section 3 provides a taxonomy, along with a systematic review

of related literature, on techniques for hiding sensitive association rules. Section 4

highlights important cryptographic protocols that facilitate preserving owner privacy

in distributed data mining. Finally, Sect. 5 concludes the chapter.

2

Input Privacy

The knowledge models produced through data mining techniques are only as good as

the accuracy of their input data. One source of data inaccuracy is when users deliber-

ately provide false information. This is especially common with regard to customers

Search WWH ::

Custom Search

Home