Hybrid Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

jeopardizing the hiding of the sensitive patterns. The result database D X contains

the 3 valid transactions and pertains to an exact hiding solution.

As one can notice in the presented examples, the solver adjusts the u qm variables

in a way that all the inequalities pertaining to an exact solution are satisfied, while

the number of zero entries in D X is maximized. Consider for instance the database

D X of Table 16.1. To hide itemset feg this itemset must not be supported by any

transaction inD X . Thus, all the respective variables become zero. Now, consider the

1-itemset ffg of the revised negative border. Since no other itemset in C supports

it, all its entries in D X can be safely turned into zero. Finally, itemset fabg has to

be supported by at least one transaction in D X to remain frequent in D. In any case,

since the optimization criterion of the CSP requires the adjustment of the minimum

number of u qm variables to '1', the solver ensures that the sanitized database D lies

as close as possible to D O , while providing an exact solution.

16.4 A Partitioning Approach to Improve Scalability

To achieve an exact hiding solution the hybrid algorithm has to regulate the values

of all items in every transaction of D X . This number of variables may, under certain

circumstances, be very large leading to a large increase in the runtime of the BIP

algorithm. Fortunately, the scalability of the hiding process can be improved without

relaxing the requirement for an exact hiding solution. To achieve this, first the hiding

process is properly decomposed into two parts of approximately equal size. Then,

each part is solved separately and, last, the individual solutions are coupled together.

16.4.1 Partitioning Methodology

Let I =I P [I P , be a decomposition of the universe of all items M of D O into two

disjoint subsets, such that I P contains all items that appear in the sensitive itemsets

along with possibly other items from I, and I P contains the rest of the items in I.

First, the revised borders for database D are identified, as in Section 16.2.3, and set

C is formulated (as in (16.7)). Each itemset in C, is then uniquely assigned to one of

the sets I P ;I P , based on the items it supports. For set I P all the assigned itemsets

from C must not support any items of I P . However, itemsets assigned to I P may

support items of I P . The ideal case scenario is a partitioning of the items in I P ;I P

such that (i) the two partitions are of approximately equal size, and (ii) as few as

possible itemsets assigned in I P support items from I P .

After formulating the two partitions and assigning the itemsets from C, relation

(16.1) and the safety margin are applied to decide on the size of D X . The goal is to

properly construct D X based on the principles presented earlier. Starting from set

I P , consider each transaction in D to be translated in this subspace; I P regulates

the assignment of the u qm variables for all transactions T q 2D X and all items m 2

Search WWH ::

Custom Search

Home