Two–Phase Iterative Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

Although this is a reasonable heuristic, it may not always lead to the optimal selec-

tion, i.e. the one where the minimum number of constraints are selected for eviction

and their removal from the CSP causes the least distortion to the database. Thus, in

what follows, we present a mechanism that was proposed in [27] which allows the

identification of the best possible set of inequalities to formulate the constraints of

the feasible CSP.

Let a constraint set be a set of inequalities corresponding to the itemsets of

the positive border, as taken from the original (infeasible) CSP. Then, as discussed

in [27], there exists a “1-1” correspondence between constraint sets and itemsets,

where each item is mapped to a constraint (and vice versa). A frequent itemset is

equivalent to a feasible constraint set, i.e. a set of constraints that has a solution

satisfying all the constraints in the set. The downwards closure property of the fre-

quent itemsets also holds in the case of constraint sets, since all the subsets (taken

by removing one or more inequalities) of a feasible constraint set are also feasible

constraint sets and all the supersets of an infeasible constraint set are also infeasible

constraint sets. Furthermore, a maximal feasible constraint set (in relation to a maxi-

mal frequent itemset) is a feasible constraint set where all its supersets are infeasible

constraint sets. The constraints removal process, explained earlier, can be considered

as the identification of a maximal feasible constraint set among the inequalities of

the revised positive border. These constraints will be the ones that will participate to

the (feasible) CSP. Due to the correspondence that exists between itemsets and con-

straint sets, one could use techniques that are currently applied on frequent itemset

mining algorithms (such as pruning approaches) to efficiently identify the maximal

constraint sets and possibly further select one among them (assuming that there ex-

ists some metric that captures the quality of the different constraint sets). However,

the decision of whether a constraint set is feasible or not is a computationally de-

manding process and for this reason a much simpler heuristic is used instead by the

two-phase iterative algorithm.

15.3 An Example

In what follows we provide an example that was borrowed from [27]. Consider

database D O depicted in Table 15.1. When mining this database for frequent item-

sets, using mfreq = 0:2, we have thatF D O =fA; B;C; D; E; AC; AD; AE;CD;CE; DE;

ACD; ACE; ADE;CDE; ACDEg. Suppose that the sensitive knowledge corresponds

to the frequent itemset S =fCDg. Given S we compute S max =fCD; ACD; ACDEg

that corresponds to the sensitive itemsets and their frequent supersets. The revised

positive border will then contain the itemsets of Bd + (F 0 D ) =fB; ACE; ADEg. The

intermediate form of database D O (generated using the inline approach) is shown

in Table 15.2. From this database, using set V =fI 2Bd + (F 0 D ) : I \I S 6= ?g=

fACE; ADEg, we produce the following set of inequalities for C = V [S =fACE;

ADE;CDg that are incorporated to the CSP:

Search WWH ::

Custom Search

Home