Distortion Schemes - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

authors propose three single rule heuristic hiding algorithms that are based on the

reduction of either the support or the confidence of the sensitive rules, but not both.

In all three approaches, the goal is to hide the sensitive rules while minimally af-

fecting the support of the nonsensitive itemsets. The first two algorithms reduce the

confidence of the sensitive association rule either (i) by increasing the support of

the rule antecedent, through transactions that partially support it, until the rule con-

fidence drops below the minimum confidence threshold, or (ii) by decreasing the

frequency of the rule consequent through transactions that support the rule, until the

rule confidence is below the minimum threshold. The third algorithm, on the other

hand, decreases the frequency of a sensitive association rule by decreasing the sup-

port of either the antecedent or the rule consequent, until either the confidence or

the support lies below the corresponding minimum threshold. A basic drawback of

the proposed methodologies is the strong assumption that all the items appearing

in a sensitive association rule do not appear in any other sensitive rule. Under this

assumption, hiding of the sensitive rules one at a time or altogether makes no dif-

ference. Moreover, the proposed methodologies fail to avoid undesired side-effects,

such as lost rules and ghost rules.

Verykios, et al. [72] extend the previous work of Dasseni, et al. [20] by improving

and evaluating the association rule hiding algorithms of [20] for their performance

under different sizes of input datasets and different sets of sensitive rules. In ad-

dition to that, the authors propose two novel heuristic algorithms that incorporate

the third strategy presented above. The first of these algorithms protects the sensi-

tive knowledge by hiding the item having the maximum support from the minimum

length transaction (i.e., the one with the least supporting items). The hiding of the

generating itemsets of the sensitive rules is performed in a decreasing order of size

and support and in a one-by-one fashion. Similarly to the first algorithm, the second

algorithm first sorts the generating itemsets with respect to their size and support,

and then hides them in a round-robin fashion as follows. First, for each generating

itemset, a random ordering of its items and of its supporting transactions is attained.

Then, the algorithm proceeds to remove the items from the corresponding transac-

tions in a round-robin fashion, until the support of the sensitive itemset drops below

the minimum support threshold, thus the itemset is hidden. The intuition behind hid-

ing in a round-robin fashion is fairness and the proposed algorithm, although rather

naïve, serves as a baseline for conducting a series of experiments.

Oliveira & Zaïane [54] were the first to introduce multiple rule hiding ap-

proaches. The proposed algorithms are efficient and require two scans of the

database, regardless of the number of sensitive itemsets to hide. During the first

scan, an index file is created to speed up the process of finding the sensitive trans-

actions and to allow for an efficient retrieval of the data. In the second scan, the

algorithms sanitize the database by selectively removing the least amount of indi-

vidual items that accommodate the hiding of the sensitive knowledge. An interesting

novelty of this work is the fact that the proposed methodology takes into account not

only the impact of the sanitization on hiding the sensitive patterns, but also the im-

pact related to the hiding of nonsensitive knowledge. Three item restriction-based

algorithms (known as MinFIA, MaxFIA, and IGA) are proposed that selectively re-

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home