Database Reference
In-Depth Information
action to the affected itemsets in the graph. When hiding a set of sensitive rules, the
algorithm first sorts the corresponding large itemsets based on their support and then
proceeds to hide them in a one-by-one fashion, using the methodology presented
above. One of the most significant contributions of this work is the proof regarding
the NP-hardness of finding an optimal sanitization of a dataset. On the negative side,
the proposed approach is not interested in the extent of the loss of support for a large
itemset, as long as it remains frequent in the sanitized outcome.
Dasseni et al. [ 15 ] generalize the problem in the sense that they consider the
hiding of both sensitive frequent itemsets and sensitive rules. The authors propose
three single rule hiding approaches that are based on the reduction of either the
support or the confidence of the sensitive rules, but not both. In all three approaches,
the goal is to hide the sensitive rules while minimally affecting the support of the
non-sensitive itemsets. The first two strategies reduce the confidence of the sensitive
rule either (i) by increasing the support of the rule antecedent, through transactions
that partially support it, until the rule confidence decreases below the minimum
confidence threshold, or (ii) by decreasing the frequency of the rule consequent
through transactions that support the rule, until the rule confidence is below the
minimum threshold. The third strategy decreases the frequency of a sensitive rule,
by decreasing the support of either the antecedent or the rule consequent, until either
the confidence or the support lies below the minimum threshold. A basic drawback
of the proposed schemes is the strong assumption that all the items appearing in a
sensitive rule do not appear in any other sensitive rule. Under this assumption, hiding
of the rules one at a time or altogether makes no difference. Moreover, since this
work aims at hiding all the sensitive knowledge appearing in the dataset, it fails to
avoid undesired side-effects, such as lost and false rules.
Verykios et al. [ 53 ] extend the work of Dasseni et al. [ 15 ] by improving and
evaluating the algorithms for their performance under different sizes of input datasets
and different sets of sensitive rules. Moreover, the authors propose two heuristic
algorithms that incorporate the third strategy presented earlier. The first of these
algorithms protects the sensitive knowledge by hiding the item having the maximum
support from the minimum length transaction. The hiding of the generating itemsets
of the sensitive rules is performed in a decreasing order of size and support, and
in a one-by-one fashion. Similar to the first algorithm, the second algorithm first
sorts the generating itemsets with respect to their size and support, and then hides
them in a round-robin fashion as follows. First, for each generating itemset, a random
ordering of its items and of its supporting transactions is attained. Then, the algorithm
proceeds to remove the items from the corresponding transactions in a round-robin
fashion, until the support of the sensitive itemset drops below the minimum support
threshold. The intuition behind hiding in a round-robin fashion is fairness and the
proposed algorithm (although rather naïve) serves as a baseline for conducting a
series of experiments.
Oliveira and Zaïane [ 39 ] were the first to introduce multiple rule hiding ap-
proaches. The proposed algorithms are efficient and require two scans of the database,
regardless of the number of sensitive itemsets to hide. During the first scan, an index
file is created to speed up the process of finding the sensitive transactions and to allow
Search WWH ::




Custom Search