Database Reference
In-Depth Information
10.1 Hiding Goals
In Chapter 2 (Section 2.2.1) we presented the main goals of association rule hiding
methodologies. In the context of frequent itemset hiding, these goals can be restated
as follows: (i) all the sensitive itemsets that are mined from the original database
should be hidden so that they cannot be mined from its sanitized counterpart under
the same (or a higher) threshold of support, (ii) all the nonsensitive frequent item-
sets of the original database should be preserved in the sanitized database so that
they can be mined under the same threshold of support, and (iii) no ghost itemsets
are introduced to the sanitized database, i.e. no itemset that was not among the non-
sensitive frequent ones mined from the original database, can be mined from the
sanitized database under the same or a higher threshold of support.
To achieve the first goal, the BBA algorithm operates by deleting specific items
belonging to the sensitive itemsets from a set of carefully selected transactions, until
the support of the sensitive itemsets drops below the minimum support threshold
and thus the itemsets are hidden. To satisfy the second goal, the algorithm attempts
to minimize the number of nonsensitive itemsets that are lost due to the enforced
sanitization process, by tracking the effect of each candidate item deletion to the
itemsets of the revised positive border. Last, it is trivial to observe that compliance of
the hiding algorithm to the third goal is achieved by construction, since the algorithm
operates by applying only item deletions and thus no infrequent itemsets of the
original database can become frequent in its sanitized counterpart.
Apart from the main goals presented above, the BBA hiding algorithm also aims
to maintain the relative support of the nonsensitive frequent itemsets in the sanitized
database. Clearly, based on the items that are selected for deletion from transac-
tions of the original database, as well as the transactions from which these items
are removed to facilitate the hiding of the sensitive knowledge, the nonsensitive
itemsets can be affected to a substantially different degree. Maintaining the relative
support of the nonsensitive frequent itemsets is an important feature of the BBA
algorithm, as it allows more accurate mining of the sanitized database at different
support thresholds. As an example of this nice property of BBA, assume two non-
sensitive frequent itemsets I and J of the original database D O for which it holds
that sup(I;D O ) > sup(J;D O ). Now let's assume that in the sanitized database D of
D O it holds that sup(I;D) < sup(J;D). Evidently, if we mine D with a minimum
support threshold msup = sup(J;D) then itemset I will be reported as infrequent.
However, this itemset should have been frequent in D based on D O . Generally, we
would like the hiding process to preserve to the highest possible extend the differ-
ence in the support of the nonsensitive frequent itemsets, i.e. for two nonsensitive
frequent itemsets I and J of D O it should hold that sup(I;D O ) sup(J;D O ) is as
close as possible to sup(I;D) sup(J;D).
Search WWH ::




Custom Search