Hybrid Algorithm - Association Rule Hiding for Data Mining - page 103

Database Reference

In-Depth Information

16.5. Overall, the produced extension contains three null transactions. Using (16.8),

k = max(32; 0) = 1, which means that only one null transaction must be kept.

Thus, after removing the two null transactions of D 1 , Algorithm 16.1 is employed

to validate the third transaction. As one can notice, database D X is exact but not

ideal, since there exists no 1-itemset (i.e., item) in B + (F 0 D ).

16.5 Experimental Evaluation

In this section we provide the results of the experiments conducted in [26] that test

the hybrid algorithm on real datasets under different parameters such as minimum

support threshold and number/size of sensitive itemsets to hide. All these datasets

are available through FIMI [30] and their properties are summarized in Table 16.6.

Table 16.6: The characteristics of the four datasets.

Dataset

N

M

Avg

tlen

BMS-WebView-1

59,602

497

2.5

BMS-WebView-2

77,512

3,340

5.0

Mushroom

8,124

119

23.0

Chess

3,196

76

37.0

The thresholds of minimum support were appropriately set to ensure an adequate

amount of frequent itemsets. Several experiments were conducted for hiding up to

20 sensitive 10-itemsets. In all conducted experiments, the CSP was formulated

based on Figure 16.3 and a safety margin of 10 transactions was used. The sensitive

itemsets for sanitization were randomly selected among the frequent ones. The hy-

brid algorithm was implemented in Perl and C and the experiments were conducted

on a PC running 64-bit Linux on an Intel Pentium D, 3.2 Ghz processor equipped

with 4GB of RAM. All integer programs were solved using ILOG CPLEX 9.0 [36].

Let notation ab denote the hiding of a itemsets of length b. Figure 16.5 pro-

vides a comparison of the hiding solutions identified by the hybrid algorithm against

three state-of-the-art approaches: the Border Based Approach (BBA) of Sun &

Yu [66, 67], the Max-Min 2 algorithm of Moustakides & Verykios [50] and the

inline algorithm of Gkoulalas-Divanis & Verykios [23], in terms of side-effects in-

troduced by the hiding process. As one can notice the hybrid algorithm consistently

outperforms the three other schemes, with the inline approach being the second best.

In most tested cases, the heuristic algorithms failed to identify a solution bearing

minimum side-effects, while the inline approach demonstrated in several occasions

that an exact solution could not be attained without extending the dataset. Figure

16.6 p resents a comparison of the four algorithms at terms of runtime cost. As ex-

Next Page

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home