Hybrid Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

16.2.4 Problem Size Reduction

To enforce the computed revised border and identify the exact hiding solution, a

mechanism is needed to regulate the status (frequent vs. infrequent) of all the item-

sets in D. Let C be the minimal set of border itemsets used to regulate the values of

the various u qm variables in D X . Moreover, suppose that I 2C is an itemset, whose

behavior we want to regulate in D. Then, itemset I will be frequent in D if and only

if sup(I;D O )+sup(I;D X ) mfreq(N +Q), or equivalently if:

Q

q=1

i m 2I u qm mfreq(N +Q)

sup(I;D O )+

(16.3)

and will be infrequent otherwise, when

Q

q=1

i m 2I u qm < mfreq(N +Q)

sup(I;D O )+

(16.4)

Inequality (16.3) corresponds to the minimum number of times that an itemset I

has to appear in the extension D X to remain frequent in D. On the other hand, in-

equality (16.4) provides the maximum number of times that an itemset I can appear

in D X in order to remain infrequent in database D.

To identify an exact solution to the hiding problem, every possible itemset in P,

according to its position in the lattice — with respect to the revised border — must

satisfy either (16.3) or (16.4). However, the complexity of solving the entire system

of the 2 M 1 inequalities is well known to be NP-hard [10, 47]. Therefore, one

should restrict the problem to capture only a small subset of these inequalities, thus

leading to a problem size that is computationally manageable. The problem formu-

lation proposed in [26] achieves this in a similar way as the inline algorithm [23],

i.e. by reducing the number of the participating inequalities that need to be satisfied.

Even more, by carefully selecting the itemsets in set C, the hiding algorithm ensures

that the exact same solution to the one of solving the entire system of inequalities,

is attained. This is accomplished by exploiting cover relations existing among the

itemsets in the lattice due to the monotonicity of support [7].

Definition 16.3. (Cover/Generalized cover) Given itemsets I; J; K 2P, itemset I

is defined as a cover of K if and only if I K and there exists no proper itemset

J such that I J K. Itemset I is a generalized cover of K if and only if I K.

Given two itemsets I; J 2P, J covers I if and only if J I, i.e. if itemset J is a

(generalized) cover of itemset I.

The basic premises for formulating set C are the following [26]. All frequent

itemsets in F 0 D should remain frequent in D, thus inequality (16.3) must hold. On

the contrary, all frequent itemsets in S min should satisfy inequality (16.4) in order

to be hidden in D. By definition, the hiding of the itemsets in S min will cause the

hiding of the itemsets in S max . Additionally, all infrequent itemsets in D O should

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home