Databases Reference
In-Depth Information
3.2 Utility Based Privacy Preservation
The process of privacy-preservation leads to loss of information for data min-
ing purposes. This loss of information can also be considered a loss of utility
for data mining purposes. Since some negative results [7] on the curse of
dimensionality suggest that a lot of attributes may need to be suppressed
in order to preserve anonymity, it is extremely important to do this care-
fully in order to preserve utility. We note that many anonymization methods
[16, 46, 77, 108] use cost measures in order to measure the information loss
from the anonymization process. examples of such utility measures include
generalization height [16], size of anonymized group [77], discernability mea-
sures of attribute values [16], and privacy information loss ratio[108]. In ad-
dition, a number of metrics such as the classification metric [55] explicitly try
to perform the privacy-preservation in such a way so as to tailor the results
with use for specific applications such as classification.
The problem of utility-based privacy-preserving data mining was first stud-
ied formally in [65]. The broad idea in [65] is to ameliorate the curse of dimen-
sionality by separately publishing marginal tables containing attributes which
have utility, but are also problematic for privacy-preservation purposes. The
generalizations performed on the marginal tables and the original tables in
fact do not need to be the same. It has been shown that this broad approach
can preserve considerable utility of the data set without violating privacy.
A method for utility-based data mining using local recoding was proposed
in [116]. The approach is based on the fact that different attributes have dif-
ferent utility from an application point of view. Most anonymization methods
are global , in which a particular tuple value is mapped to the same generalized
value globally. In local recoding, the data space is partitioned into a number
of regions, and the mapping of the tuple to the generalizes value is local to
that region. Clearly, this kind of approach has greater flexibility, since it can
tailor the generalization process to a particular region of the data set. In [116],
it has been shown that this method can perform quite effectively because of
its local recoding strategy.
Another indirect approach to utility based anonymization is to make the
privacy-preservation algorithms more aware of the workload [72]. Typically,
data recipients may request only a subset of the data in many cases, and the
union of these different requested parts of the data set is referred to as the
workload. Clearly, a workload in which some records are used more frequently
than others tends to suggest a different anonymization than one which is based
on the entire data set. In [72], an effective and ecient algorithm has been
proposed for workload aware anonymization.
3.3 Sequential Releases
Privacy-preserving data mining poses unique problems for dynamic applica-
tions such as data streams because in such cases, the data is released sequen-
tially. In other cases, different views of the table may be released sequentially.
Search WWH ::




Custom Search