Hybrid Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

ing it to the exact algorithm of [23] (Chapter 14) and the border based approaches

of [66, 67] (Chapter 10) and [50, 51] (Chapter 11).

16.1 Knowledge Hiding Formulation

This section sets out the hiding methodology that is employed by the hybrid algo-

rithm, along with a running example which helps towards its proper understanding.

16.1.1 Hybrid Hiding Methodology

To properly introduce the hiding methodology one needs to consider the existence

of three databases, all depicted in binary format. They are defined as follows:

Database D O is the original database which, when mined at a certain support

threshold msup, leads to the disclosure of some sensitive knowledge in the form

of sensitive frequent patterns. This sensitive knowledge needs to be protected.

Database D X is a minimal extension of D O that is created by the hiding algo-

rithm during the sanitization process to facilitate knowledge hiding.

Database D is the union of database D O and the applied extension D X and cor-

responds to the sanitized outcome that can be safely released.

Suppose that database D O consists of N transactions. By preforming frequent

itemset mining in D O , using a support threshold msup set by the owner of the data,

a set of frequent patterns F D O are discovered, among which, a subset S contains

patterns which are sensitive from the owner's perspective. The goal of the hybrid

hiding algorithm is to create a minimal extension to the original database D O in

a way that the final, sanitized database D protects the sensitive itemsets from dis-

closure. The database extension can by itself be considered as a new database D X ,

since it consists of a set of transactions in the same space of items I as the ones

of D O . Among alternative hiding solutions that may exist, the target of the hybrid

algorithm is to protect the sensitive knowledge, while minimally affecting the non-

sensitive itemsets appearing in F D O . This means that all the nonsensitive itemsets

in F D O should continue to appear as frequent among the mined patterns from D,

when performing frequent itemset mining using the same or a higher support thresh-

old. The hiding of a sensitive itemset is equivalent to a degradation of its statistical

significance, in terms of support, in the result database.

The hybrid algorithm [26] first applies border revision to identify the revised bor-

ders for D, then computes the minimal size for the extension D X and, by using the

itemsets of the revised borders, defines a CSP that is solved using BIP. In this way,

all possible assignments of itemsets to the transactions of D X are examined and

the optimal assignment is bound to be found. Although the properties of the pro-

duced CSP allow for an acceptable runtime of the hiding algorithm, there are cases

Search WWH ::

Custom Search

Home