Database Reference
In-Depth Information
in a typical case there exist many overlapping itemsets in D O and therefore a large
number of overlapping inequalities are produced. Identifying such inequalities helps
drastically reduce the size of the problem up to a point that it becomes solvable.
As we presented earlier, an one-to-one correspondence exists between itemsets
and produced inequalities that participate to the CSP. Given that C is the total set
of affected itemsets, such that C = fC 1 ;C 2 ; : : : ;C 2 M 1 g, we denote by L C the set
of solutions of the corresponding inequalities from C. The set of solutions for the
system of inequalities will be the intersection among the solution produced for each
individual inequality. Thus, we can write L C = T jCj
i=1 L C i .
Based on the above notation, the set of solutions that corresponds to all itemsets
in D O is denoted as L Ã(I)nf?g . Given an inequality (produced, for example, by an
itemset C 2 ) with a solution set being a proper subset of the solution set of another
inequality, say C 1 , we can deduce that the latter inequality can be removed from any
system containing the first inequality, without affecting the global solution of this
system. In this case we have that L C 2 L C 1 and we state that C 1 is a (generalized)
cover of C 2 . By exploiting cover relations, the authors of [23] achieve to reduce the
problem of satisfying all the inequalities produced by all the itemsets in D O , to the
examination of a much smaller set
C =fI 2Bd + (F 0 D ) : I\I S min 6= ?g[S min
(14.5)
where notation I S min depicts the items of the itemsets in S min . Thus, I\I S min refers
to those itemsets of the revised positive border that do not contain any items from
those appearing in the minimum sensitive itemsets (i.e., in S min ). Set C provides
us the optimal hiding solution L C , if one exists. The reduction of the set of all
inequalities to those involving the itemsets in C, is proven as part of [23].
14.2.2 Reduction to CSP and the BIP solution
After the formalization of the hiding process, the whole problem can been regarded
as a CSP, whose solution will provide the sanitized database D. CSPs can be solved
by using various techniques, such as linear and non-linear programming [44]. In the
current context all variables are binary in nature as they refer to items participating
to specific transactions; this fact provides an important advantage as we will present
later on. To solve the CSP, the inline algorithm first transforms it to an optimization
problem and then applies a technique which is known as BIP [32]. The employed
formulation enables the solution of the sanitization problem in D O and is capable of
identifying the optimal solution (if one exists). In the case of problems where exact
solutions are infeasible, a relaxation of the inline algorithm (by using a heuristic tar-
geted for inequalities selection and removal) is adopted that allows the identification
of a good approximate solution.
Figure 14.2 presents the CSP formulation as an optimization problem. As one
can notice, the degree of the constraints participating in the problem formulation is
Search WWH ::




Custom Search