Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

In many cases, associations between pseudo-identifiers and sensitive at-

tributes can be protected by using multiple views, such that the pseudo-

identifiers and sensitive attributes occur in different views of the table. Thus,

only a small subset of the selected views may be made available. It may be

possible to achieve k -anonymity because of the lossy nature of the join across

the two views. In the event that the join is not lossy enough, it may result in

a violation of k -anonymity. In [121], the problem of violation of k -anonymity

using multiple views has been studied. It has been shown that the problem

is NP-hard in general. It has been shown in [121] that a polynomial time

algorithm is possible if functional dependencies exist between the different

views.

An interesting analysis of the safety of k -anonymization methods has been

discussed in [68]. It tries to model the effectiveness of a k -anonymous represen-

tation, given that the attacker has some prior knowledge about the data such

as a sample of the original data. Clearly, the more similar the sample data is

to the true data, the greater the risk. The technique in [68] uses this fact to

construct a model in which it calculates the expected number of items iden-

tified. This kind of technique can be useful in situations where it is desirable

to determine whether or not anonymization should be used as the technique

of choice for a particular situation.

3.1 Personalized Privacy Preservation

Not all individuals or entities are equally concerned about their privacy. For

example, a corporation may have very different constraints on the privacy of

its records as compared to an individual. This leads to the natural problem

that we may wish to treat the records in a given data set very differently

for anonymization purposes. From a technical point of view, this means that

the value of k for anonymization is not fixed but may vary with the record.

A condensation-based approach [9] has been proposed for privacy-preserving

data mining in the presence of variable constraints on the privacy of the data

records. This technique constructs groups of non-homogeneous size from the

data, such that it is guaranteed that each record lies in a group whose size is

at least equal to its anonymity level. Subsequently, pseudo-data is generated

from each group so as to create a synthetic data set with the same aggregate

distribution as the original data.

Another interesting model of personalized anonymity is discussed in [114]

in which a person can specify the level of privacy for his or her sensitive

values . This technique assumes that an individual can specify a node of the

domain generalization hierarchy in order to decide the level of anonymity that

he can work with. This approach has the advantage that it allows for direct

protection of the sensitive values of individuals than a vanilla k -anonymity

method which is susceptible to different kinds of attacks.

Search WWH ::

Custom Search

Home