Database Reference
In-Depth Information
ing it to the exact algorithm of [23] (Chapter 14) and the border based approaches
of [66, 67] (Chapter 10) and [50, 51] (Chapter 11).
16.1 Knowledge Hiding Formulation
This section sets out the hiding methodology that is employed by the hybrid algo-
rithm, along with a running example which helps towards its proper understanding.
16.1.1 Hybrid Hiding Methodology
To properly introduce the hiding methodology one needs to consider the existence
of three databases, all depicted in binary format. They are defined as follows:
Database D O is the original database which, when mined at a certain support
threshold msup, leads to the disclosure of some sensitive knowledge in the form
of sensitive frequent patterns. This sensitive knowledge needs to be protected.
Database D X is a minimal extension of D O that is created by the hiding algo-
rithm during the sanitization process to facilitate knowledge hiding.
Database D is the union of database D O and the applied extension D X and cor-
responds to the sanitized outcome that can be safely released.
Suppose that database D O consists of N transactions. By preforming frequent
itemset mining in D O , using a support threshold msup set by the owner of the data,
a set of frequent patterns F D O are discovered, among which, a subset S contains
patterns which are sensitive from the owner's perspective. The goal of the hybrid
hiding algorithm is to create a minimal extension to the original database D O in
a way that the final, sanitized database D protects the sensitive itemsets from dis-
closure. The database extension can by itself be considered as a new database D X ,
since it consists of a set of transactions in the same space of items I as the ones
of D O . Among alternative hiding solutions that may exist, the target of the hybrid
algorithm is to protect the sensitive knowledge, while minimally affecting the non-
sensitive itemsets appearing in F D O . This means that all the nonsensitive itemsets
in F D O should continue to appear as frequent among the mined patterns from D,
when performing frequent itemset mining using the same or a higher support thresh-
old. The hiding of a sensitive itemset is equivalent to a degradation of its statistical
significance, in terms of support, in the result database.
The hybrid algorithm [26] first applies border revision to identify the revised bor-
ders for D, then computes the minimal size for the extension D X and, by using the
itemsets of the revised borders, defines a CSP that is solved using BIP. In this way,
all possible assignments of itemsets to the transactions of D X are examined and
the optimal assignment is bound to be found. Although the properties of the pro-
duced CSP allow for an acceptable runtime of the hiding algorithm, there are cases
Search WWH ::




Custom Search