Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

3.2 Utility Based Privacy Preservation

The process of privacy-preservation leads to loss of information for data min-

ing purposes. This loss of information can also be considered a loss of utility

for data mining purposes. Since some negative results [7] on the curse of

dimensionality suggest that a lot of attributes may need to be suppressed

in order to preserve anonymity, it is extremely important to do this care-

fully in order to preserve utility. We note that many anonymization methods

[16, 46, 77, 108] use cost measures in order to measure the information loss

from the anonymization process. examples of such utility measures include

generalization height [16], size of anonymized group [77], discernability mea-

sures of attribute values [16], and privacy information loss ratio[108]. In ad-

dition, a number of metrics such as the classification metric [55] explicitly try

to perform the privacy-preservation in such a way so as to tailor the results

with use for specific applications such as classification.

The problem of utility-based privacy-preserving data mining was first stud-

ied formally in [65]. The broad idea in [65] is to ameliorate the curse of dimen-

sionality by separately publishing marginal tables containing attributes which

have utility, but are also problematic for privacy-preservation purposes. The

generalizations performed on the marginal tables and the original tables in

fact do not need to be the same. It has been shown that this broad approach

can preserve considerable utility of the data set without violating privacy.

A method for utility-based data mining using local recoding was proposed

in [116]. The approach is based on the fact that different attributes have dif-

ferent utility from an application point of view. Most anonymization methods

are global , in which a particular tuple value is mapped to the same generalized

value globally. In local recoding, the data space is partitioned into a number

of regions, and the mapping of the tuple to the generalizes value is local to

that region. Clearly, this kind of approach has greater flexibility, since it can

tailor the generalization process to a particular region of the data set. In [116],

it has been shown that this method can perform quite effectively because of

its local recoding strategy.

Another indirect approach to utility based anonymization is to make the

privacy-preservation algorithms more aware of the workload [72]. Typically,

data recipients may request only a subset of the data in many cases, and the

union of these different requested parts of the data set is referred to as the

workload. Clearly, a workload in which some records are used more frequently

than others tends to suggest a different anonymization than one which is based

on the entire data set. In [72], an effective and ecient algorithm has been

proposed for workload aware anonymization.

3.3 Sequential Releases

Privacy-preserving data mining poses unique problems for dynamic applica-

tions such as data streams because in such cases, the data is released sequen-

tially. In other cases, different views of the table may be released sequentially.

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home