Distortion Schemes - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

Chapter 6

Distortion Schemes

In this chapter, we review some popular support-based and confidence-based dis-

tortion algorithms that have been proposed in the association rule hiding litera-

ture. Distortion-based approaches operate by selecting specific items to include to

(or exclude from) selected transactions of the original database in order to facili-

tate the hiding of the sensitive association rules. Two of the most commonly em-

ployed strategies for data distortion involve the swapping of values between trans-

actions [10, 20], as well as the deletion of specific items from the database [54]. In

the rest of this chapter, we present an overview of these approaches along with other

methodologies that also fit in the same class.

Atallah, et al. [10] were the first to propose an algorithm for the hiding of sensi-

tive association rules through the reduction in the support of their generating item-

sets. The authors propose the construction of a lattice-like graph [65] in the database.

Through this graph, the hiding of a large itemset, related to the existence of a sen-

sitive rule, is achieved by a greedy iterative traversal of its immediate subsets, se-

lection of the subset that has the maximum support among all candidates (therefore

is less probable to be hidden) and consideration of this itemset as the new candi-

date to be hidden. By iteratively following these steps, the algorithm identifies the

1-itemset ancestor of the initial sensitive itemset with the highest support. Then,

by identifying the supporting transactions for both the initial candidate and the cur-

rently identified 1-itemset, the algorithm removes the 1-itemset (item) from the

supporting transaction which affects the least number of 2-itemsets. In sequel, the

algorithm propagates the results of this action to the affected itemsets in the graph.

When hiding a set of sensitive rules, the algorithm first sorts the corresponding large

itemsets based on their support and then proceeds to hide them in a one-by-one fash-

ion, using the methodology presented above. One of the most significant contribu-

tions of this work is the proof that the authors provide regarding the NP-hardness

of finding an optimal sanitization of a dataset. On the negative side, the proposed

approach does not take into consideration the extend of loss in support for the large

itemsets, as long as they remain frequent in the sanitized database.

Dasseni, et al. [20] generalize the hiding problem in the sense that they consider

the hiding of both sensitive frequent itemsets and sensitive association rules. The

Search WWH ::

Custom Search

Home