Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

5 Privacy-Preservation of Application Results

In many cases, the output of applications can be used by an adversary in order

to make significant inferences about the behavior of the underlying data. In

this section, we will discuss a number of miscellaneous methods for privacy-

preserving data mining which tend to preserve the privacy of the end results of

applications such as association rule mining and query processing. This prob-

lem is related to that of disclosure control [1] in statistical databases, though

advances in data mining methods provide increasingly sophisticated meth-

ods for adversaries to make inferences about the behavior of the underlying

data. In cases, where the commercial data needs to be shared, the associa-

tion rules may represent sensitive information for target-marketing purposes,

which needs to be protected from inference.

In this section, we will discuss the issue of disclosure control for a num-

ber of applications such as association rule mining, classification, and query

processing. The key goal here is to prevent adversaries from making infer-

ences from the end results of data mining and management applications. A

broad discussion of the security and privacy implications of data mining are

presented in [30]. We will discuss each of the applications below:

5.1 Association Rule Hiding

Recent years have seen tremendous advances in the ability to perform as-

sociation rule mining effectively. Such rules often encode important target

marketing information about a business. Some of the earliest work on the

challenges of association rule mining for database security may be found in

[14]. Two broad approaches are used for association rule hiding:

•

Distortion: In distortion [89], the entry for a given transaction is modified

to a different value. Since, we are typically dealing with binary transac-

tional data sets, the entry value is flipped.

•

Blocking: In blocking [96], the entry is not modified, but is left incom-

plete. Thus, unknown entry values are used to prevent discovery of asso-

ciation rules.

We note that both the distortion and blocking processes have a number of side

effects on the non-sensitive rules in the data. Some of the non-sensitive rules

may be lost along with sensitive rules, and new ghost rules may be created

because of the distortion or blocking process. Such side effects are undesirable

since they reduce the utility of the data for mining purposes.

A formal proof of the NP-hardness of the distortion method for hiding

association rule mining may be found in [14]. In [14], techniques are proposed

for changing some of the 1-values to 0-values so that the support of the corre-

sponding sensitive rules is appropriately lowered. The utility of the approach

was defined by the number of non-sensitive rules whose support was also low-

ered by using such an approach. This approach was extended in [31] in which

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home