Hybrid Algorithm - Association Rule Hiding for Data Mining - page 107

Database Reference

In-Depth Information

Figure 16.7 presents the dependence that exists between the size of the exten-

sion and the sensitive itemsets. In this figure the hiding scenarios are separated into

groups of four. In each group, the hiding of more itemsets of the same length in-

cludes all the sensitive itemsets that were selected for the previous hiding scenario.

For instance, the 5 10-itemsets of the 1010 hiding scenario, are the same as in the

510 hiding scenario. In the chess dataset, the a7 itemsets are selected to have

lower supports than their counterparts participating in the a10 hiding scenarios.

On the other hand, in the mushroom dataset, the group of a10 itemsets reflects

itemsets that lie near the border, whereas the a5 itemsets are highly supported in

the dataset. Figure 16.8 follows the layout of Figure 16.7 and presents the relation

between the number of constraints in the CSP and the number and the size of the

sensitive itemsets. As is shown, the hiding of more itemsets of the same size leads

to the production of more inequalities for the CSP, since in the typical case, the

size of the negative border is augmented. A similar relation exists between the min-

imum support threshold of the dataset and the sensitive itemsets to be hidden. The

lower the minimum support threshold of the dataset, the more itemsets become fre-

quent. Supposing that one wishes to hide the exact same itemsets, if the minimum

support threshold is reduced then in the typical case more transactions are neces-

sary in D X to ensure that the revised positive border will be preserved in D. Thus,

more inequalities have to be included in the CSP to accomplish this goal. On the

other hand, increment of the minimum support threshold typically leads to smaller

problems and thus to a better performance of the hiding algorithm.

Mushroom Dataset

1200

BBA

MaxMin2

Inline

1000

800

600

400

200

0

3x5

5x5 10x5 15x5 20x5 4x6

5x7 10x7 15x7 20x7 5x10 10x10 15x10 20x10

Hiding Scenarios

Fig. 16.9: Distance of the three hiding schemes.

Figure 16.9 presents the distance (i.e., number of item modifications) between

the original and the sanitized database that is required by each algorithm to facilitate

knowledge hiding. Since the two border-based approaches and the inline algorithm

operate in a similar fashion (i.e., by selecting transactions of the original database

Next Page

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home