Inline Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

Algorithm 14.1 Relaxation Procedure in V .

1: procedure S ELECT R EMOVE (Constraints C R , V, D)

2:

C R maxlen S argmax i fjR i jg

. C R i $V i

3:

cr msup min C R maxlen ;i (sup(R i 2V; D))

. R i 2 V

4:

for each c 2C R maxlen do

5:

if sup(R i ; D) = cr msup then

6:

Remove (c)

. remove constraint from the CSP.

7: end if

8: end for

9: end procedure

will always have a solution. This is due to the fact that all inequalities are of the same

type, so in the worst case scenario zeroing out all the involved u nm variables will

hide the sensitive knowledge. However, such a solution is certainly not desirable.

To identify a good hiding solution, the inline algorithm removes a small portion

of the constraints imposed by the itemsets in V (see Figure 14.2) so that the CSP

of the remaining inequalities becomes solvable. The question that is now raised is

how to select the inequalities to remove. Algorithm 14.1 provides a simple heuristic

for selection and removal of inequalities from the CSP of Figure 14.2. It applies a

relaxation process by removing all constraints that correspond to maximal size and

minimum support itemsets in V . The rationale behind this heuristic is that the hidden

itemsets in D (due to the side-effects of hiding itemsets in S max ) will be the first that

would be hidden in D O if the support threshold was increased, since their current

support is also low. After removing a subset of the constraints that participate to the

CSP, the remaining problem will become solvable. Thus, the relaxation process is

iteratively applied to the CSP until a solution is attained. Last, it is important to note

that each repetition of the relaxation process may potentially lead to the removal of

more than one constraints from the original CSP.

14.2.3 An Example

To demonstrate the core approach excluding the relaxation process, we borrow the

example that is used in [23]. Consider the transactional database that is presented in

Table 14.1. Using a minimum frequency threshold mfreq= 0:2, the set of frequent

itemsets that are mined from this database is: F D O =fA; B;C; D; AB; AC; AD;CD;

ACDg. Now suppose that one wishes to hide the sensitive itemset fABg, therefore

S =fABg. As a first step, the inline algorithm computes the ideal set of frequent

itemsets forD by using Algorithm 9.4. That is:F 0 D =fA; B;C; D; AC; AD;CD; ACDg.

The revised positive border for D will then be: Bd + (F 0 D ) =fB; ACDg. To produce

the constraints for the CSP, the inline algorithm has to consider all itemsets appear-

ing in C (see (14.5)). In this example, we have that: V =Bd + (F 0 D ) =fB; ACDg. The

next step is to substitute in all transactions supporting the sensitive itemsets, their

current T nm values of the sensitive items with u nm binary variables. This constitutes

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home