Privacy in Database Publishing: A Bayesian Perspective - Database Security: Applications and Trends

Databases Reference

In-Depth Information

not provide a lower bound for finding optimal l-diverse anonymizations, they

conjecture NP-hardness as well, and show how to adapt the Incognito Algo-

rithm [12].

Sensitive Data Generalization. There are slight exceptions from as-

sumption Util : an example occurs in [22]. In this work, sensitive data is not

published in the clear, but generalized itself using a function f . The gener-

alization function f exploits a hierarchy among concepts in the sensitive do-

main, treating ancestor concepts as more general than descendant concepts.

For instance, instead of displaying “pneumonia”, the owner may release a

more general concept such as “respiratory tract problems” which in turn is

generalized by “antibiotic-curable ailment”. Evidently the objective in [22] is

to minimize the information loss resulting from generalization of both quasi-

identifiers and sensitive attributes. We can capture this scenario as well in the

GBP model, by simply adjusting assumption Util to state that the owner is

willing to live with the attacker's belief after seeing the generalized sensitive

values described by view V s ( R ):= f ( Π S ( R )).

T-Closeness. One paper that explicitly states and exploits assumption

Util is [14]. It considers the probability distribution p on the secrets

{S r } r∈R

after seeing the entire anonymized table

A g ( R ), and the probability distri-

bution q of the sensitive values in R , i.e. in V s ( R ). The authors introduce

the privacy guarantee of t-closeness , which holds if the distribution distance

between p and q is smaller than a parameter threshold t . The authors show

shortcomings of standard metrics for comparing distributions and propose

their own. They also show that the search for a t-close anonymization that

maximizes utility (under a standard measure) can be performed by adapt-

ing ecient algorithms developed for k-anonymity. However, t-closeness does

not subsume k-anonymity and the authors suggest combining the two before

releasing an anonymized table.

An Alternative Bayesian Modeling. [17] compares the notion of l-

diversity to a model called Bayesian Optimal Privacy (BOP) model. Just like

the GBP model, the BOP model is based on belief revision. However, the

authors conclude a mismatch between l-diversity and the BOP model. As

demonstrated in this section, the reason is not due to any fundamental mis-

match between Bayesian privacy models and l-diversity. Rather, it stems from

the particular modeling choice in [17] which ignores assumption Util : [17] con-

siders that a priori the attacker sees V id ( R ) but not V s ( R ). The diculty with

this modeling (identified in [17] as well) is that to estimate the attacker's a

priori belief revision about

S r , we require knowledge of the attacker's proba-

bility distribution on the domain of all sensitive values, which is an unrealistic

expectation. The modeling we describe in this section surmounts this obstacle,

as under assumption Util , it needn't care about this distribution; it only con-

siders belief revision starting from the attacker's adjusted belief after seeing

V s ( R ). We can estimate this belief (as in (5)), regardless of the belief before

seeing V s ( R ).

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home