Database Reference
In-Depth Information
16.5. Overall, the produced extension contains three null transactions. Using (16.8),
k = max(32; 0) = 1, which means that only one null transaction must be kept.
Thus, after removing the two null transactions of D 1 , Algorithm 16.1 is employed
to validate the third transaction. As one can notice, database D X is exact but not
ideal, since there exists no 1-itemset (i.e., item) in B + (F 0 D ).
16.5 Experimental Evaluation
In this section we provide the results of the experiments conducted in [26] that test
the hybrid algorithm on real datasets under different parameters such as minimum
support threshold and number/size of sensitive itemsets to hide. All these datasets
are available through FIMI [30] and their properties are summarized in Table 16.6.
Table 16.6: The characteristics of the four datasets.
Dataset
N
M
Avg
tlen
BMS-WebView-1
59,602
497
2.5
BMS-WebView-2
77,512
3,340
5.0
Mushroom
8,124
119
23.0
Chess
3,196
76
37.0
The thresholds of minimum support were appropriately set to ensure an adequate
amount of frequent itemsets. Several experiments were conducted for hiding up to
20 sensitive 10-itemsets. In all conducted experiments, the CSP was formulated
based on Figure 16.3 and a safety margin of 10 transactions was used. The sensitive
itemsets for sanitization were randomly selected among the frequent ones. The hy-
brid algorithm was implemented in Perl and C and the experiments were conducted
on a PC running 64-bit Linux on an Intel Pentium D, 3.2 Ghz processor equipped
with 4GB of RAM. All integer programs were solved using ILOG CPLEX 9.0 [36].
Let notation ab denote the hiding of a itemsets of length b. Figure 16.5 pro-
vides a comparison of the hiding solutions identified by the hybrid algorithm against
three state-of-the-art approaches: the Border Based Approach (BBA) of Sun &
Yu [66, 67], the Max-Min 2 algorithm of Moustakides & Verykios [50] and the
inline algorithm of Gkoulalas-Divanis & Verykios [23], in terms of side-effects in-
troduced by the hiding process. As one can notice the hybrid algorithm consistently
outperforms the three other schemes, with the inline approach being the second best.
In most tested cases, the heuristic algorithms failed to identify a solution bearing
minimum side-effects, while the inline approach demonstrated in several occasions
that an exact solution could not be attained without extending the dataset. Figure
16.6 p resents a comparison of the four algorithms at terms of runtime cost. As ex-
 
 
Search WWH ::




Custom Search