Database Reference
In-Depth Information
Figure 16.7 presents the dependence that exists between the size of the exten-
sion and the sensitive itemsets. In this figure the hiding scenarios are separated into
groups of four. In each group, the hiding of more itemsets of the same length in-
cludes all the sensitive itemsets that were selected for the previous hiding scenario.
For instance, the 5 10-itemsets of the 1010 hiding scenario, are the same as in the
510 hiding scenario. In the chess dataset, the a7 itemsets are selected to have
lower supports than their counterparts participating in the a10 hiding scenarios.
On the other hand, in the mushroom dataset, the group of a10 itemsets reflects
itemsets that lie near the border, whereas the a5 itemsets are highly supported in
the dataset. Figure 16.8 follows the layout of Figure 16.7 and presents the relation
between the number of constraints in the CSP and the number and the size of the
sensitive itemsets. As is shown, the hiding of more itemsets of the same size leads
to the production of more inequalities for the CSP, since in the typical case, the
size of the negative border is augmented. A similar relation exists between the min-
imum support threshold of the dataset and the sensitive itemsets to be hidden. The
lower the minimum support threshold of the dataset, the more itemsets become fre-
quent. Supposing that one wishes to hide the exact same itemsets, if the minimum
support threshold is reduced then in the typical case more transactions are neces-
sary in D X to ensure that the revised positive border will be preserved in D. Thus,
more inequalities have to be included in the CSP to accomplish this goal. On the
other hand, increment of the minimum support threshold typically leads to smaller
problems and thus to a better performance of the hiding algorithm.
Mushroom Dataset
1200
BBA
MaxMin2
Inline
1000
800
600
400
200
0
3x5
5x5 10x5 15x5 20x5 4x6
5x7 10x7 15x7 20x7 5x10 10x10 15x10 20x10
Hiding Scenarios
Fig. 16.9: Distance of the three hiding schemes.
Figure 16.9 presents the distance (i.e., number of item modifications) between
the original and the sanitized database that is required by each algorithm to facilitate
knowledge hiding. Since the two border-based approaches and the inline algorithm
operate in a similar fashion (i.e., by selecting transactions of the original database
 
Search WWH ::




Custom Search