Database Reference
In-Depth Information
Although this is a reasonable heuristic, it may not always lead to the optimal selec-
tion, i.e. the one where the minimum number of constraints are selected for eviction
and their removal from the CSP causes the least distortion to the database. Thus, in
what follows, we present a mechanism that was proposed in [27] which allows the
identification of the best possible set of inequalities to formulate the constraints of
the feasible CSP.
Let a constraint set be a set of inequalities corresponding to the itemsets of
the positive border, as taken from the original (infeasible) CSP. Then, as discussed
in [27], there exists a “1-1” correspondence between constraint sets and itemsets,
where each item is mapped to a constraint (and vice versa). A frequent itemset is
equivalent to a feasible constraint set, i.e. a set of constraints that has a solution
satisfying all the constraints in the set. The downwards closure property of the fre-
quent itemsets also holds in the case of constraint sets, since all the subsets (taken
by removing one or more inequalities) of a feasible constraint set are also feasible
constraint sets and all the supersets of an infeasible constraint set are also infeasible
constraint sets. Furthermore, a maximal feasible constraint set (in relation to a maxi-
mal frequent itemset) is a feasible constraint set where all its supersets are infeasible
constraint sets. The constraints removal process, explained earlier, can be considered
as the identification of a maximal feasible constraint set among the inequalities of
the revised positive border. These constraints will be the ones that will participate to
the (feasible) CSP. Due to the correspondence that exists between itemsets and con-
straint sets, one could use techniques that are currently applied on frequent itemset
mining algorithms (such as pruning approaches) to efficiently identify the maximal
constraint sets and possibly further select one among them (assuming that there ex-
ists some metric that captures the quality of the different constraint sets). However,
the decision of whether a constraint set is feasible or not is a computationally de-
manding process and for this reason a much simpler heuristic is used instead by the
two-phase iterative algorithm.
15.3 An Example
In what follows we provide an example that was borrowed from [27]. Consider
database D O depicted in Table 15.1. When mining this database for frequent item-
sets, using mfreq = 0:2, we have thatF D O =fA; B;C; D; E; AC; AD; AE;CD;CE; DE;
ACD; ACE; ADE;CDE; ACDEg. Suppose that the sensitive knowledge corresponds
to the frequent itemset S =fCDg. Given S we compute S max =fCD; ACD; ACDEg
that corresponds to the sensitive itemsets and their frequent supersets. The revised
positive border will then contain the itemsets of Bd + (F 0 D ) =fB; ACE; ADEg. The
intermediate form of database D O (generated using the inline approach) is shown
in Table 15.2. From this database, using set V =fI 2Bd + (F 0 D ) : I \I S 6= ?g=
fACE; ADEg, we produce the following set of inequalities for C = V [S =fACE;
ADE;CDg that are incorporated to the CSP:
Search WWH ::




Custom Search