Database Reference
In-Depth Information
In what follows, we present a framework that was proposed in [25] that can be
applied as part of the sanitization process of exact hiding algorithms to improve
their computational cost. The framework operates in three phases: (i) the structural
decomposition phase, (ii) the decomposition of large individual components phase,
and (iii) the parallel solving phase for the produced CSPs. In what follows, we
discuss in detail each phase of the framework.
17.1.1 Structural Decomposition of the CSP
The number of constraints in a CSP can be very large depending on the properties of
the database, the minimum support threshold that was used for the mining of the fre-
quent itemsets, as well as the number and length of sensitive itemsets that need to be
hidden. Moreover, the fact that various initial constraints may incorporate products
of u nm variables, thus have a need to be replaced by numerous linear inequalities
(using the CDR approach), makes the whole BIP problem tougher to solve. There
is, however, a nice property in the CSPs that can be used to improve their solution
time. That is, decomposition.
Based on the divide and conquer paradigm, a decomposition approach allows the
fragmentation of a large problem into numerous smaller ones, the solution of these
new subproblems independently, and the subsequent aggregation of the partial so-
lutions to attain the same overall solution as the one of solving the entire problem.
The property of the CSPs which allows considering such a strategy lies behind the
optimization criterion that is used. Indeed, one can easily notice that the criterion
of maximizing (or equivalently minimizing) the summation of the binary u nm vari-
ables is satisfied when as many u nm variables as possible are set to one (equivalently
to zero). This, can be established independently, provided that the constraints that
participate in the CSP allow for an appropriate decomposition. The approach that
is followed in [25] for the initial decomposition of the CSP is similar to the de-
composition structure identification algorithm presented in [47], although applied
in a “constraints” rather than a “transactions” level. As demonstrated in Figure 17.1,
the output of structural decomposition, when applied on the original CSP, is a set
of smaller CSPs that can be solved independently. An example will allow to better
demonstrate how this process works.
Consider database D O presented in Table 17.1. By performing frequent itemset
mining in D O using frequency threshold mfreq = 0:3, we compute the following set
of frequent itemsets: F D O =fA; B;C; D; AB;CDg. Suppose that one wishes to hide
the sensitive itemsets in S=fB;CDg, using for instance the inline approach 1 . Then,
it holds that:
1 We need to mention that it is of no importance which methodology will be used to produce
the CSP, apart from the obvious fact that some methodologies may produce CSPs that are better
decomposable than those constructed by other approaches. However, the structure of the CSP also
depends on the problem instance and thus it is difficult to know in advance which algorithm is
bound to produce a better decomposable CSP.
Search WWH ::




Custom Search