Databases Reference
In-Depth Information
In many cases, associations between pseudo-identifiers and sensitive at-
tributes can be protected by using multiple views, such that the pseudo-
identifiers and sensitive attributes occur in different views of the table. Thus,
only a small subset of the selected views may be made available. It may be
possible to achieve k -anonymity because of the lossy nature of the join across
the two views. In the event that the join is not lossy enough, it may result in
a violation of k -anonymity. In [121], the problem of violation of k -anonymity
using multiple views has been studied. It has been shown that the problem
is NP-hard in general. It has been shown in [121] that a polynomial time
algorithm is possible if functional dependencies exist between the different
views.
An interesting analysis of the safety of k -anonymization methods has been
discussed in [68]. It tries to model the effectiveness of a k -anonymous represen-
tation, given that the attacker has some prior knowledge about the data such
as a sample of the original data. Clearly, the more similar the sample data is
to the true data, the greater the risk. The technique in [68] uses this fact to
construct a model in which it calculates the expected number of items iden-
tified. This kind of technique can be useful in situations where it is desirable
to determine whether or not anonymization should be used as the technique
of choice for a particular situation.
3.1 Personalized Privacy Preservation
Not all individuals or entities are equally concerned about their privacy. For
example, a corporation may have very different constraints on the privacy of
its records as compared to an individual. This leads to the natural problem
that we may wish to treat the records in a given data set very differently
for anonymization purposes. From a technical point of view, this means that
the value of k for anonymization is not fixed but may vary with the record.
A condensation-based approach [9] has been proposed for privacy-preserving
data mining in the presence of variable constraints on the privacy of the data
records. This technique constructs groups of non-homogeneous size from the
data, such that it is guaranteed that each record lies in a group whose size is
at least equal to its anonymity level. Subsequently, pseudo-data is generated
from each group so as to create a synthetic data set with the same aggregate
distribution as the original data.
Another interesting model of personalized anonymity is discussed in [114]
in which a person can specify the level of privacy for his or her sensitive
values . This technique assumes that an individual can specify a node of the
domain generalization hierarchy in order to decide the level of anonymity that
he can work with. This approach has the advantage that it allows for direct
protection of the sensitive values of individuals than a vanilla k -anonymity
method which is susceptible to different kinds of attacks.
Search WWH ::




Custom Search