Databases Reference
In-Depth Information
large so that individual record values cannot be recovered. Therefore, tech-
niques are designed to derive aggregate distributions from the perturbed
records. Subsequently, data mining techniques can be developed in order
to work with these aggregate distributions. We will describe the random-
ization technique in greater detail in a later section.
The k-anonymity model and l-diversity: The k -anonymity model was de-
veloped because of the possibility of indirect identification of records from
public databases. This is because combinations of record attributes can be
used to exactly identify individual records. In the k -anonymity method, we
reduce the granularity of data representation with the use of techniques
such as generalization and suppression. This granularity is reduced suf-
ficiently that any given record maps onto at least k other records in the
data. The l -diversity model was designed to handle some weaknesses in the
k -anonymity model since protecting identities to the level of k -individuals
is not the same as protecting the corresponding sensitive values, especially
when there is homogeneity of sensitive values within a group. To do so,
the concept of intra-group diversity of sensitive values is promoted within
the anonymization scheme [77].
Distributed privacy preservation: In many cases, individual entities may
wish to derive aggregate results from data sets which are partitioned across
these entities. Such partitioning may be horizontal (when the records are
distributed across multiple entities) or vertical (when the attributes are
distributed across multiple entities). While the individual entities may
not desire to share their entire data sets, they may consent to limited
information sharing with the use of a variety of protocols. The overall
effect of such methods is to maintain privacy for each individual entity,
while deriving aggregate results over the entire data.
Downgrading Application Effectiveness: In many cases, even though the
data may not be available, the output of applications such as association
rule mining, classification or query processing may result in violations of
privacy. This has lead to research in downgrading the effectiveness of ap-
plications by either data or application modifications. Some examples of
such techniques include association rule hiding [106], classifier downgrad-
ing [83], and query auditing [1].
In this paper, we will provide a broad overview of the different techniques
for privacy-preserving data mining. We will provide a review of the major
algorithms available for each method, and the variations on the different tech-
niques. We will also discuss a number of combinations of different concepts
such as k -anonymous mining over vertically- or horizontally-partitioned data.
We will also discuss a number of unique challenges associated with privacy-
preserving data mining in the high dimensional case.
This paper is organized as follows. In section 2, we will introduce the ran-
domization method for privacy preserving data mining. In section 3, we will
discuss the k -anonymization method along with its different variations. In sec-
Search WWH ::




Custom Search