Hybrid Algorithm - Association Rule Hiding for Data Mining - page 108

Database Reference

In-Depth Information

and excluding some items), it makes sense to compare them in terms of the pro-

duced distances. From the comparison it is evident that the inline approach achieves

to minimize the number of item modifications, a result that can be attributed to the

optimization criterion of the generated CSPs. On the contrary, the hybrid hiding al-

gorithm does not alter the original dataset but instead uses a database extension to (i)

leave unsupported the sensitive itemsets so as to be hidden in D, and (ii) adequately

support the itemsets of the revised positive border in order to remain frequent in D.

For this reason, the item modifications (0s ! 1s) that are introduced by the hybrid

hiding algorithm in D X should not be attributed to the hiding task of the algorithm

but rather to its power to preserve the revised positive border and thus eliminate the

side-effects. This important difference between the hybrid algorithm and the other

three approaches hardens their comparison in terms of item modifications. However,

due to the common way that both the inline and the hybrid approaches model the

CSPs, the property of minimum distortion of the original database is bound to hold

for the hybrid hiding algorithm.

BMS−WebView−1 Dataset

700

Hybrid

Partitioning

600

500

400

300

200

100

0

1x2

2x2

1x3

2x3

1x4

2x4

Hiding Scenarios

Fig. 16.10: Performance of the partitioning approach.

Figure 16.10 p resents the performance gain that is accomplished when using the

partitioning approach. As one can notice, the split of the CSP into two parts has a

significant benefit in the performance of the hybrid hiding algorithm.

An interesting insight from the experiments is the fact that the hybrid approach,

when compared to the inline algorithm [23] and the border-based approaches of

[50,66], can better preserve the quality of the border and produce superior solutions.

Indeed, the hybrid approach introduces the least amount of side-effects among the

four tested algorithms. On the other hand, the hybrid approach is worse in terms of

scalability than its competitors, due to the large number of the u qm variables and the

associated constraints of the produced CSP. Moreover, depending on the properties

of the used dataset, there are cases where a substantial amount of transactions has

to be added to the original dataset to facilitate knowledge hiding. This situation

Next Page

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home