Database Reference
In-Depth Information
16.2.4 Problem Size Reduction
To enforce the computed revised border and identify the exact hiding solution, a
mechanism is needed to regulate the status (frequent vs. infrequent) of all the item-
sets in D. Let C be the minimal set of border itemsets used to regulate the values of
the various u qm variables in D X . Moreover, suppose that I 2C is an itemset, whose
behavior we want to regulate in D. Then, itemset I will be frequent in D if and only
if sup(I;D O )+sup(I;D X ) mfreq(N +Q), or equivalently if:
Q
q=1
i m 2I u qm mfreq(N +Q)
sup(I;D O )+
(16.3)
and will be infrequent otherwise, when
Q
q=1
i m 2I u qm < mfreq(N +Q)
sup(I;D O )+
(16.4)
Inequality (16.3) corresponds to the minimum number of times that an itemset I
has to appear in the extension D X to remain frequent in D. On the other hand, in-
equality (16.4) provides the maximum number of times that an itemset I can appear
in D X in order to remain infrequent in database D.
To identify an exact solution to the hiding problem, every possible itemset in P,
according to its position in the lattice — with respect to the revised border — must
satisfy either (16.3) or (16.4). However, the complexity of solving the entire system
of the 2 M 1 inequalities is well known to be NP-hard [10, 47]. Therefore, one
should restrict the problem to capture only a small subset of these inequalities, thus
leading to a problem size that is computationally manageable. The problem formu-
lation proposed in [26] achieves this in a similar way as the inline algorithm [23],
i.e. by reducing the number of the participating inequalities that need to be satisfied.
Even more, by carefully selecting the itemsets in set C, the hiding algorithm ensures
that the exact same solution to the one of solving the entire system of inequalities,
is attained. This is accomplished by exploiting cover relations existing among the
itemsets in the lattice due to the monotonicity of support [7].
Definition 16.3. (Cover/Generalized cover) Given itemsets I; J; K 2P, itemset I
is defined as a cover of K if and only if I K and there exists no proper itemset
J such that I J K. Itemset I is a generalized cover of K if and only if I K.
Given two itemsets I; J 2P, J covers I if and only if J I, i.e. if itemset J is a
(generalized) cover of itemset I.
The basic premises for formulating set C are the following [26]. All frequent
itemsets in F 0 D should remain frequent in D, thus inequality (16.3) must hold. On
the contrary, all frequent itemsets in S min should satisfy inequality (16.4) in order
to be hidden in D. By definition, the hiding of the itemsets in S min will cause the
hiding of the itemsets in S max . Additionally, all infrequent itemsets in D O should
Search WWH ::




Custom Search