Distortion Schemes - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

move items from transactions that support the sensitive rules. The first algorithm,

MinFIA, operates as follows. For every restrictive pattern it identifies the support-

ing transactions and the item having the smallest support in the pattern (called the

victim item). Then, by using a user-supplied disclosure threshold, it first sorts the

identified transactions in ascending order of degree of conflict and then selects the

number of transactions (among them) that need to be sanitized. Finally, from each

selected transaction the algorithm removes the victim item. The MaxFIA algorithm

proceeds exactly as the MinFIA with the only difference of selecting as the victim

item the one that has the maximum support in the sensitive association rule. Fi-

nally, IGA aims at clustering the restricted patterns into groups that share the same

itemsets. By identifying overlapping clusters, the algorithm hides the corresponding

sensitive patterns at once (based on the sensitive itemsets they share), thus reduces

the distortion that is induced to the database when hiding the sensitive knowledge.

A more efficient approach than that of [54] and the work of [20, 63, 64], was

introduced by Oliveira & Zaïane in [55]. The proposed algorithm, called SWA, is an

efficient, scalable, one-scan heuristic which aims at providing a balance between the

needs for privacy and knowledge discovery in association rule hiding. It achieves to

hide multiple rules in only one pass through the dataset, regardless of its size or the

number of sensitive rules that need to be protected. The algorithm proceeds in five

steps that are applied to every group of K transactions (thus formulating a window

of size K) read from the original database. Initially, the nonsensitive transactions

are separated from the sensitive ones and copied directly to the sanitized database.

For each sensitive rule, the item having the highest frequency is selected and its

supporting transactions are identified. Then, a disclosure threshold y, potentially

different for each sensitive rule, is used to capture the severity characterizing the

release of the rule. Based on this threshold, SWA computes the number of supporting

transactions that need to be sanitized for each rule and then sorts them in ascending

order of length. For each selected transaction, the item with the highest frequency

as identified before is removed and then the transaction is copied to the sanitized

database. SWA is experimentally shown to outperform the state-of-the-art heuristic

approaches in terms of concealing all the sensitive rules, while maintaining high

data utility of the released database.

Amiri [9] proposes three effective, multiple association rule hiding heuristics

that outperform SWA by offering higher data utility and lower distortion, at the

expense of increased computational speed. Although similar in philosophy to the

previous approaches, the three proposed methodologies do a better job in modeling

the overall objective of a rule hiding algorithm. The first approach, called Aggregate,

computes the union of the supporting transactions for all sensitive itemsets. Among

them, the transaction that supports the most sensitive and the least nonsensitive item-

sets is selected and expelled from the database. The same process is repeated until

all the sensitive itemsets are hidden. Similarly to this approach, the Disaggregate

approach aims at removing individual items from transactions, rather than remov-

ing the entire transaction. It achieves that by computing the union of all transactions

supporting sensitive itemsets and then, for each transaction and supporting item, by

calculating the number of sensitive and nonsensitive itemsets that will be affected if

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home