Database Reference
In-Depth Information
16.1.2 A Running Example
To shed light into the workings of the hybrid algorithm, in what follows we borrow
the running example of [26]. Suppose that we are provided with database D O of
Table 16.1 . Applying frequent itemset mining in D O using mfreq = 0:3, leads to the
set of large itemsets F D O appearing in the upper part of Table 16.2. Among these
itemsets, S =fe; ae; bcg denotes the sensitive knowledge that has to be protected.
The hybrid hiding algorithm aims at the creation of a database extension D X to D O
(see Table 16.1 ) that allows the hiding of the sensitive knowledge, while keeping the
nonsensitive patterns frequent in the sanitized outcome.
Table 16.1 summarizes the target of the hybrid hiding algorithm. The union of
the two datasets, D O and D X , corresponds to the sanitized outcome D that can
be safely released. Thus, the primary goal of the hiding algorithm is to construct
the privacy aware extension D X of D O such that (i) it contains the least amount
of transactions that are necessary to ensure the proper hiding of all the sensitive
knowledge in D O , and (ii) it introduces no side-effects to the hiding process. As one
can observe in the lower part of Table 16.2, all the sensitive itemsets of D O along
with their supersets are infrequent in D (shown under the dashed line), while all
the nonsensitive itemsets of D O remain frequent. Since D O is extended, in order to
ensure that the nonsensitive patterns will remain frequent in D, the hiding algorithm
needs to appropriately increase their support in the sanitized database.
16.2 Main Issues Pertaining to the Hiding Methodology
The hybrid hiding methodology produces a sanitized database D that corresponds
to a mixture of the original transactions in D O and a set of synthetic transactions,
artificially created to prohibit the leakage of the sensitive knowledge. For security
reasons, all the transactions in D are assumed to be randomly ordered so that it is
difficult for an adversary to distinguish between the real ones and those that were
added by the hiding algorithm to secure the sensitive knowledge. There are several
issues of major importance, involving the hiding methodology, that need to be ex-
amined. To continue, let P denote the size of database D, N is the size of database
D O , and Q is the size of the extension D X .
16.2.1 Size of Database Extension
Since database D O is extended by D X to construct database D, an initial and very
important step in the hiding process is the computation of the size of D X (i.e., the
necessary amount of transactions that need to be added to those of D O to facilitate
the hiding of the sensitive knowledge). A lower bound on this value can be estab-
lished based on the sensitive itemset in S which has the highest support (breaking
Search WWH ::




Custom Search