Distortion Schemes - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

this item is removed from the transaction. Finally, it selects to remove the item from

the transaction that will affect the higher number of sensitive and the least num-

ber of nonsensitive itemsets. The third approach, called Hybrid, is a combination of

the two previous algorithms; it employs the Aggregate approach to identify the sen-

sitive transactions and the Disaggregate approach to selectively delete items from

these transactions, until all the sensitive knowledge is appropriately concealed.

Wu, et al. [79] propose a sophisticated methodology that removes the assump-

tion of [20] regarding the disjoint relation among the items of the various sensitive

rules. Using set theory, the authors formalize a set of constraints related to the possi-

ble side-effects of the hiding process and allow item modifications to enforce these

constraints. However, the existing correlation among the rules can make impossible

the hiding of the sensitive knowledge without the violation of any constraints. For

this reason, the user is permitted to specify which of the constraints he/she considers

more significant and relax the rest. A drawback of the approach is the simultaneous

relaxation (without the users' consent) of the constraint regarding the hiding of all

the sensitive itemsets. To accommodate for rule hiding, the new scheme defines a

class of allowable modifications that are represented as templates and are selected

in a one-by-one fashion. A template contains the item to be modified, the applied

operation, the items to be preserved or removed from the transaction and coverage

information regarding the number of rules that are affected. Based on this informa-

tion the algorithm can select and apply only the templates that are considered as

beneficial, since they cause the least side-effects to the sanitized database.

Pontikakis, et al. [59] propose two distortion-based heuristics to selectively hide

the sensitive association rules. The proposed schemes use efficient data structures

for the representation of the association rules and effectively prioritize the selection

of transactions for sanitization. However, in both algorithms the proposed hiding

process may introduce a number of side-effects, either by generating rules which

were previously unknown, or by eliminating existing nonsensitive rules. The first

algorithm, called Priority-based Distortion Algorithm (PDA), reduces the confi-

dence of a sensitive association rule by reversing 1's to 0's in items belonging in the

rule's consequent. The second algorithm, called Weight-based Sorting Distortion

Algorithm (WDA), concentrates on the optimization of the hiding process in an at-

tempt to achieve the least side-effects and the minimum complexity. This is achieved

through the use of priority values assigned to transactions based on weights. Both

PDA and WDA are experimentally shown to produce hiding solutions of compa-

rable (or slightly better) quality than the ones produced by the algorithms of [64],

generally introducing few side-effects. However, both algorithms are computation-

ally demanding, with PDA requiring typically twice the time of the hiding method-

ologies in [64] to facilitate the hiding of the sensitive knowledge.

Wang & Jafari [76, 77] propose two data modification algorithms that aim at the

hiding of predictive association rules, i.e. rules containing the sensitive items on

their left hand side (rule antecedent). Both algorithms rely on the distortion of a

portion of the database transactions to lower the confidence of the sensitive associa-

tion rules. The first strategy, called ISL, decreases the confidence of a sensitive rule

by increasing the support of the itemset in its left hand side. The second approach,

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home