Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

action to the affected itemsets in the graph. When hiding a set of sensitive rules, the

algorithm first sorts the corresponding large itemsets based on their support and then

proceeds to hide them in a one-by-one fashion, using the methodology presented

above. One of the most significant contributions of this work is the proof regarding

the NP-hardness of finding an optimal sanitization of a dataset. On the negative side,

the proposed approach is not interested in the extent of the loss of support for a large

itemset, as long as it remains frequent in the sanitized outcome.

Dasseni et al. [ 15 ] generalize the problem in the sense that they consider the

hiding of both sensitive frequent itemsets and sensitive rules. The authors propose

three single rule hiding approaches that are based on the reduction of either the

support or the confidence of the sensitive rules, but not both. In all three approaches,

the goal is to hide the sensitive rules while minimally affecting the support of the

non-sensitive itemsets. The first two strategies reduce the confidence of the sensitive

rule either (i) by increasing the support of the rule antecedent, through transactions

that partially support it, until the rule confidence decreases below the minimum

confidence threshold, or (ii) by decreasing the frequency of the rule consequent

through transactions that support the rule, until the rule confidence is below the

minimum threshold. The third strategy decreases the frequency of a sensitive rule,

by decreasing the support of either the antecedent or the rule consequent, until either

the confidence or the support lies below the minimum threshold. A basic drawback

of the proposed schemes is the strong assumption that all the items appearing in a

sensitive rule do not appear in any other sensitive rule. Under this assumption, hiding

of the rules one at a time or altogether makes no difference. Moreover, since this

work aims at hiding all the sensitive knowledge appearing in the dataset, it fails to

avoid undesired side-effects, such as lost and false rules.

Verykios et al. [ 53 ] extend the work of Dasseni et al. [ 15 ] by improving and

evaluating the algorithms for their performance under different sizes of input datasets

and different sets of sensitive rules. Moreover, the authors propose two heuristic

algorithms that incorporate the third strategy presented earlier. The first of these

algorithms protects the sensitive knowledge by hiding the item having the maximum

support from the minimum length transaction. The hiding of the generating itemsets

of the sensitive rules is performed in a decreasing order of size and support, and

in a one-by-one fashion. Similar to the first algorithm, the second algorithm first

sorts the generating itemsets with respect to their size and support, and then hides

them in a round-robin fashion as follows. First, for each generating itemset, a random

ordering of its items and of its supporting transactions is attained. Then, the algorithm

proceeds to remove the items from the corresponding transactions in a round-robin

fashion, until the support of the sensitive itemset drops below the minimum support

threshold. The intuition behind hiding in a round-robin fashion is fairness and the

proposed algorithm (although rather naïve) serves as a baseline for conducting a

series of experiments.

Oliveira and Zaïane [ 39 ] were the first to introduce multiple rule hiding ap-

proaches. The proposed algorithms are efficient and require two scans of the database,

regardless of the number of sensitive itemsets to hide. During the first scan, an index

file is created to speed up the process of finding the sensitive transactions and to allow

Frequent Pattern Mining

Search WWH ::

Custom Search

Home