Database Reference
In-Depth Information
and excluding some items), it makes sense to compare them in terms of the pro-
duced distances. From the comparison it is evident that the inline approach achieves
to minimize the number of item modifications, a result that can be attributed to the
optimization criterion of the generated CSPs. On the contrary, the hybrid hiding al-
gorithm does not alter the original dataset but instead uses a database extension to (i)
leave unsupported the sensitive itemsets so as to be hidden in D, and (ii) adequately
support the itemsets of the revised positive border in order to remain frequent in D.
For this reason, the item modifications (0s ! 1s) that are introduced by the hybrid
hiding algorithm in D X should not be attributed to the hiding task of the algorithm
but rather to its power to preserve the revised positive border and thus eliminate the
side-effects. This important difference between the hybrid algorithm and the other
three approaches hardens their comparison in terms of item modifications. However,
due to the common way that both the inline and the hybrid approaches model the
CSPs, the property of minimum distortion of the original database is bound to hold
for the hybrid hiding algorithm.
BMS−WebView−1 Dataset
700
Hybrid
Partitioning
600
500
400
300
200
100
0
1x2
2x2
1x3
2x3
1x4
2x4
Hiding Scenarios
Fig. 16.10: Performance of the partitioning approach.
Figure 16.10 p resents the performance gain that is accomplished when using the
partitioning approach. As one can notice, the split of the CSP into two parts has a
significant benefit in the performance of the hybrid hiding algorithm.
An interesting insight from the experiments is the fact that the hybrid approach,
when compared to the inline algorithm [23] and the border-based approaches of
[50,66], can better preserve the quality of the border and produce superior solutions.
Indeed, the hybrid approach introduces the least amount of side-effects among the
four tested algorithms. On the other hand, the hybrid approach is worse in terms of
scalability than its competitors, due to the large number of the u qm variables and the
associated constraints of the produced CSP. Moreover, depending on the properties
of the used dataset, there are cases where a substantial amount of transactions has
to be added to the original dataset to facilitate knowledge hiding. This situation
 
 
Search WWH ::




Custom Search