Database Reference
In-Depth Information
10.2 Hiding Process
The BBA algorithm concentrates on the itemsets of the revised positive border to
evaluate the impact of each candidate item deletion to the quality of the database
and, subsequently, select the item whose deletion introduces the least side-effects to
the hiding process. Given an itemset I that belongs to the set of minimal sensitive
itemsets S min , the authors of [66, 67] define the set C I that contains all pairs (T; i)
of transactions T and items i 2 I from the original database D O that can be used to
lower the support of itemset I. C I is called the set of hiding candidates for itemset
I and is formally represented as C I = f(T; i)jT 2 D I ^i 2 Ig, where D I is the set
of supporting transactions for itemset I. Using this notation, pair (T o ; i o ) represents
the deletion of candidate item i o from transaction T o of the original database D O , to
facilitate the lowering of the support of a sensitive itemset through item i o . When a
hiding candidate is deleted all other hiding candidates that involve the same sensi-
tive itemset and the same transaction are removed from set C I , since their potential
deletion will not affect the support of the sensitive itemset. Thus, by construction,
set C I comprises of hiding candidates, the deletion of each of which leads to a decre-
ment of one in the support of a sensitive itemset. As an effect, the BBA algorithm
has to select the minimum number of hiding candidates that are associated with each
itemset in S min and apply the corresponding deletions to facilitate the hiding of this
itemset. The selected hiding candidates will be the ones that affect the itemsets of
the positive border, as well as their relative support, the least. The aim of the pro-
posed algorithm is to apply the necessary item deletions so that the support of each
sensitive itemset in the sanitized database D is just below the minimum support
threshold and, thus, the itemset is properly hidden.
The hiding of a sensitive itemset may cause a decrement to the support of some
nonsensitive frequent itemsets that share common items with the sensitive one. De-
pending on the properties of the database and the minimum support threshold that
is used, it is also possible that some nonsensitive frequent itemsets become lost as
a side-effect of the hiding process. As is evident, the loss of one or more nonsensi-
tive frequent itemsets causes a retreat of the borderline that separates the frequent
from the infrequent itemsets in the (under sanitization) database. The retreat of the
borderline directly affects the itemsets that participate to the revised positive border.
Thus, when an itemset of the revised positive border becomes infrequent, the border
should be recomputed to account for this change as well as to allow for accurate
subsequent decisions to be made by the sanitization process. However, such com-
putations may be costly and, in order to avoid this extra cost, the BBA algorithm
operates under the assumption that the new border, after the loss of nonsensitive
frequent itemsets, will be a good approximation of the old one in most cases 1 .
1 An interesting observation that should be made at this point is that the need of recomputing
the revised positive border is dictated by the greedy nature of the BBA algorithm. Since item
deletions are applied in a one-by-one fashion, each item deletion may cause the loss of one or more
itemsets that belong to the revised positive border, which subsequently needs to be recalculated
to accommodate this change. Contrary to border based approaches, exact hiding algorithms (see
Chapters 14, 15 and 16) do not suffer from this shortcoming, since they apply all necessary item
Search WWH ::




Custom Search