Database Reference
In-Depth Information
jeopardizing the hiding of the sensitive patterns. The result database D X contains
the 3 valid transactions and pertains to an exact hiding solution.
As one can notice in the presented examples, the solver adjusts the u qm variables
in a way that all the inequalities pertaining to an exact solution are satisfied, while
the number of zero entries in D X is maximized. Consider for instance the database
D X of Table 16.1. To hide itemset feg this itemset must not be supported by any
transaction inD X . Thus, all the respective variables become zero. Now, consider the
1-itemset ffg of the revised negative border. Since no other itemset in C supports
it, all its entries in D X can be safely turned into zero. Finally, itemset fabg has to
be supported by at least one transaction in D X to remain frequent in D. In any case,
since the optimization criterion of the CSP requires the adjustment of the minimum
number of u qm variables to '1', the solver ensures that the sanitized database D lies
as close as possible to D O , while providing an exact solution.
16.4 A Partitioning Approach to Improve Scalability
To achieve an exact hiding solution the hybrid algorithm has to regulate the values
of all items in every transaction of D X . This number of variables may, under certain
circumstances, be very large leading to a large increase in the runtime of the BIP
algorithm. Fortunately, the scalability of the hiding process can be improved without
relaxing the requirement for an exact hiding solution. To achieve this, first the hiding
process is properly decomposed into two parts of approximately equal size. Then,
each part is solved separately and, last, the individual solutions are coupled together.
16.4.1 Partitioning Methodology
Let I =I P [I P , be a decomposition of the universe of all items M of D O into two
disjoint subsets, such that I P contains all items that appear in the sensitive itemsets
along with possibly other items from I, and I P contains the rest of the items in I.
First, the revised borders for database D are identified, as in Section 16.2.3, and set
C is formulated (as in (16.7)). Each itemset in C, is then uniquely assigned to one of
the sets I P ;I P , based on the items it supports. For set I P all the assigned itemsets
from C must not support any items of I P . However, itemsets assigned to I P may
support items of I P . The ideal case scenario is a partitioning of the items in I P ;I P
such that (i) the two partitions are of approximately equal size, and (ii) as few as
possible itemsets assigned in I P support items from I P .
After formulating the two partitions and assigning the itemsets from C, relation
(16.1) and the safety margin are applied to decide on the size of D X . The goal is to
properly construct D X based on the principles presented earlier. Starting from set
I P , consider each transaction in D to be translated in this subspace; I P regulates
the assignment of the u qm variables for all transactions T q 2D X and all items m 2
Search WWH ::




Custom Search