Database Reference
In-Depth Information
Input Privacy The first category is related to the data per se and is known as data
hiding or input privacy . Specifically, data hiding tries to obfuscate the disclosed
data in order to prevent the miner from reliably extracting confidential or private
information. Input privacy methods aim at addressing environments where users
are unwilling to provide their personal information, or deliberately provide false
information, to data recipients, because they fear that their privacy may be violated.
The goal of these methods is to guarantee that such personal information can be
released to (potentially untrusted) data recipients in a privacy-preserving way that still
allows the data recipients to build accurate data mining models from the released data.
Several methods have been proposed to provide input privacy (e.g., [ 7 , 6 , 17 , 45 ])
by employing various data transformation strategies.
Output Privacy The second category concerns the information, or the knowledge,
that a data mining method may discover after having analyzed the data, and is known
as knowledge hiding or output privacy . Specifically, it is concerned with the sanitiza-
tion of confidential knowledge patterns derived from the data. Output privacy meth-
ods aim to eliminate the disclosure of sensitive patterns from datasets. If the datasets
were shared as-is, then such patterns could easily lead to (a) the disclosure of sensi-
tive information, such as business or trade secrets that provide competitive advantage
to business competitors, or (b) discrimination, if they involve individuals in the input
data who have certain unique characteristics. Several methods have been proposed
to offer output privacy (e.g., [ 11 , 15 , 21 , 39 ]) by eliminating sensitive patterns from
the released data, in a way that minimizes data distortion and side-effects.
Owner Privacy Finally, a third line of research involves protocols that enable a
group of data owners to collectively mine their data, in a distributed fashion, with-
out allowing any party to reliably learn the data (or sensitive information about
the data) that the other owners hold—that is, the sources of the data. For this pur-
pose, several cryptographic methods have been recently proposed to facilitate the
privacy-preserving distributed mining of data that reside in different data warehouses
(e.g., [ 26 , 51 , 52 ]). These methods assume that the data are either horizontally or
vertically partitioned among the different sites, and that any sensitive disclosures
should be limited in the data mining process.
The rest of this chapter is organized as follows: Sect. 2 elaborates on input
privacy methods that enable the safe discovery of association rules from large his-
torical databases. Section 3 provides a taxonomy, along with a systematic review
of related literature, on techniques for hiding sensitive association rules. Section 4
highlights important cryptographic protocols that facilitate preserving owner privacy
in distributed data mining. Finally, Sect. 5 concludes the chapter.
2
Input Privacy
The knowledge models produced through data mining techniques are only as good as
the accuracy of their input data. One source of data inaccuracy is when users deliber-
ately provide false information. This is especially common with regard to customers
Search WWH ::




Custom Search