Distortion Schemes - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

called DSR, reduces the confidence of the rule by decreasing the support of the

itemset in its right hand side (rule consequent). Both algorithms experience the item

ordering effect under which, based on the order that the sensitive items are hidden,

the produced sanitized databases can be different. Moreover, the DSR algorithm is

usually more effective when the sensitive items have a high support. Compared to

the work of Saygin, et al. [63, 64], the algorithms presented in [76, 77] require a

reduced number of database scans and have an efficient pruning strategy. However,

by construction, they are assigned the task of hiding all the rules containing the sen-

sitive items on their left hand side, while the algorithms in the work of Saygin, et

al. [63, 64] can hide any type of sensitive association rule.

Lee, et al. [42] introduce a data distortion approach that operates by first con-

structing a sanitization matrix from the original data and then multiplying the orig-

inal database (represented as a transactions-by-items matrix) with the sanitization

matrix in order to obtain the sanitized database. The applied matrix multiplication

strategy follows a new definition that aims to enforce the suppression of selected

items from transactions of the original database, thus reduce the support of the

sensitive itemsets. Along these lines, the authors develop three sanitization algo-

rithms: Hidden-First (HF), Non-Hidden-First (NHF) and HPCME (Hiding sensitive

Patterns Completely with Minimum side Effect on nonsensitive patterns). The first

algorithm takes a drastic approach to eliminate the sensitive knowledge from the

original database and is shown to lead to hiding solutions that suffer from the loss

of nonsensitive itemsets. The second algorithm, focuses on the preservation of the

nonsensitive patterns and thus may fail to hide all the sensitive knowledge from the

database. Last, the third algorithm tries to combine the advantages of HF and NHF

in order to hide all sensitive itemsets with minimal impact on the nonsensitive ones.

To achieve this goal, the algorithm introduces a factor restoration probability that it

uses to decide when the preservation of the nonsensitive patterns does not affect the

hiding of the sensitive ones, and thus take the appropriate action.

Search WWH ::

Custom Search

Home