Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

are hidden. Similarly to this approach, the Disaggregate approach aims at removing

individual items from transactions, rather than removing the entire transaction. It

achieves that by computing the union of all transactions supporting sensitive item-

sets and then, for each transaction and supporting item, by calculating the number of

sensitive and non-sensitive itemsets that will be affected if this item is removed from

the transaction. Finally, it chooses to remove the item from the transaction that will

affect the most sensitive and the least non-sensitive itemsets. The third approach,

called Hybrid , is a combination of the previous two, since it uses Aggregate to iden-

tify the sensitive transactions and Disaggregate to selectively delete items of these

transactions, until the sensitive knowledge is hidden.

Wu et al. [ 55 ] propose a sophisticated methodology that removes the assumption of

[ 15 ], regarding the disjoint relation among the items of the various sensitive rules. By

using set theory, the authors formalize a set of constraints related to the possible side-

effects of the hiding process and allow item modifications to enforce these constraints.

However, the correlations among the rules can make impossible the hiding of the

sensitive knowledge, without the violation of any constraints. For this reason, the user

can specify which constraints she considers more significant and to relax the rest. A

drawback of this approach is the simultaneous relaxation (without the user's consent)

of the constraint regarding the hiding of all the sensitive itemsets. To accommodate

for rule hiding, the new scheme defines a class of allowable modifications that are

represented as templates and are selected in a one-by-one fashion. A template contains

the item to be modified, the applied operation, the items to be preserved or removed

from the transaction, and coverage information regarding the number of rules that

are affected. Based on this, the algorithm can select and apply only the templates

that are considered as beneficial since they minimize the number of side-effects.

Pontikakis et al. [ 43 ] propose two distortion-based heuristics to selectively hide

the sensitive rules. On the positive side, the proposed schemes use effective data

structures for the representation of the rules and effectively prioritize the selection

of transactions for sanitization. However, in both algorithms the proposed hiding

process may introduce a number of side-effects, either by generating ghost rules

which were previously non-existent, or by eliminating existing non-sensitive rules.

The first algorithm, called Priority-based Distortion Algorithm (PDA), reduces the

confidence of a rule by reversing 1's to 0's in items belonging in its consequent. The

second algorithm, called Weight-based Sorting Distortion Algorithm (WDA), con-

centrates on the optimization of the hiding process in an attempt to achieve the least

side-effects and the minimum complexity. This is achieved through the use of pri-

ority values assigned to transactions based on weights. Regarding performance, the

proposed schemes tend to produce hiding solutions of comparable or slightly higher

quality than the algorithms in [ 48 ], by generally introducing less side-effects. How-

ever, both algorithms are computationally demanding, with PDA requiring typically

twice the time of the schemes in [ 48 ] to perform the hiding process.

Support-based and Confidence-based Blocking Schemes Saygin et al. [ 47 , 48 ]

were the first to propose the use of unknowns (represented as question marks in

the database), instead of transforming 1's to 0's and the opposite, for the hiding of

Frequent Pattern Mining

Search WWH ::

Custom Search

Home