Parallelization Framework for Exact Hiding - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

In what follows, we present a framework that was proposed in [25] that can be

applied as part of the sanitization process of exact hiding algorithms to improve

their computational cost. The framework operates in three phases: (i) the structural

decomposition phase, (ii) the decomposition of large individual components phase,

and (iii) the parallel solving phase for the produced CSPs. In what follows, we

discuss in detail each phase of the framework.

17.1.1 Structural Decomposition of the CSP

The number of constraints in a CSP can be very large depending on the properties of

the database, the minimum support threshold that was used for the mining of the fre-

quent itemsets, as well as the number and length of sensitive itemsets that need to be

hidden. Moreover, the fact that various initial constraints may incorporate products

of u nm variables, thus have a need to be replaced by numerous linear inequalities

(using the CDR approach), makes the whole BIP problem tougher to solve. There

is, however, a nice property in the CSPs that can be used to improve their solution

time. That is, decomposition.

Based on the divide and conquer paradigm, a decomposition approach allows the

fragmentation of a large problem into numerous smaller ones, the solution of these

new subproblems independently, and the subsequent aggregation of the partial so-

lutions to attain the same overall solution as the one of solving the entire problem.

The property of the CSPs which allows considering such a strategy lies behind the

optimization criterion that is used. Indeed, one can easily notice that the criterion

of maximizing (or equivalently minimizing) the summation of the binary u nm vari-

ables is satisfied when as many u nm variables as possible are set to one (equivalently

to zero). This, can be established independently, provided that the constraints that

participate in the CSP allow for an appropriate decomposition. The approach that

is followed in [25] for the initial decomposition of the CSP is similar to the de-

composition structure identification algorithm presented in [47], although applied

in a “constraints” rather than a “transactions” level. As demonstrated in Figure 17.1,

the output of structural decomposition, when applied on the original CSP, is a set

of smaller CSPs that can be solved independently. An example will allow to better

demonstrate how this process works.

Consider database D O presented in Table 17.1. By performing frequent itemset

mining in D O using frequency threshold mfreq = 0:3, we compute the following set

of frequent itemsets: F D O =fA; B;C; D; AB;CDg. Suppose that one wishes to hide

the sensitive itemsets in S=fB;CDg, using for instance the inline approach 1 . Then,

it holds that:

1 We need to mention that it is of no importance which methodology will be used to produce

the CSP, apart from the obvious fact that some methodologies may produce CSPs that are better

decomposable than those constructed by other approaches. However, the structure of the CSP also

depends on the problem instance and thus it is difficult to know in advance which algorithm is

bound to produce a better decomposable CSP.

Search WWH ::

Custom Search

Home