Database Reference
In-Depth Information
outcome. To balance the trade-off between privacy and data loss the proposed algo-
rithm incorporates a safety margin that corresponds to the extend of sanitization that
is performed in the dataset. The higher the safety margin the better the protection of
the sensitive rules and the worse the data quality of the resulting dataset.
Border-based Approaches In this section, we review two border-based approaches
for the hiding of sensitive rules. The work of Sun and Yu [ 50 ] was the first to in-
troduce the process of border revision for the hiding of sensitive association rules.
In their work, the authors propose a heuristic approach that uses the notion of the
border (further analyzed in [ 31 ]) of the non-sensitive frequent itemsets to track the
impact of altering transactions in the database. The proposed scheme, first computes
the positive and the negative borders in the lattice of all itemsets and then focuses
on preserving the quality of the computed borders during the hiding process. The
quality of the borders directly affects the quality of the sanitized database that is
produced, which can be maintained by greedily selecting those modifications that
lead to minimal side-effects. In the proposed heuristic, a weight is assigned to each
element of the expected positive border (which is the original positive border after
it has been shaped up with the removal of the sensitive itemsets) in an attempt to
quantify its vulnerability of being affected by item deletion. These weights are dy-
namically computed (during the sanitization process) as a function of the current
support of the corresponding itemsets in the database. To reduce the support of a
sensitive itemset from the negative border, the algorithm calculates the impact of the
possible item deletions by computing the sum of the weights of the positive border
elements that will be affected. Then, it proceeds to delete the candidate item that will
have the minimal impact on the positive border.
Moustakides and Verykios [ 34 ] follow a similar approach to [ 50 ] by proposing
two heuristics that use the revised positive and negative borders, produced by the
removal of the sensitive itemsets and their supersets from the old frequent itemset
lattice. The proposed algorithms try to remove from the database all the sensitive
itemsets that belong to the revised negative border, while maintaining frequent all
the itemsets of the revised positive border. For every item of a sensitive itemset, the
algorithms list the set of positive border itemsets which depend on it. Then, from
among all minimum border itemsets, the one with the highest support is selected
as it is the one with the maximum distance from the border. This itemset, called
the max-min itemset, determines the item through which the hiding of the sensitive
itemset will incur. The proposed algorithms try to modify this item in such a way
that the support of the max-min itemset is minimally affected. When hiding multiple
itemsets, the algorithms perform the sanitization in a one-by-one fashion, starting
from the itemsets that have lower supports. Finally, the second algorithm improves
the first one and, through experimental evaluation, is shown to provide better hiding
solutions than [ 50 ], in the majority of the tested settings.
Exact Approaches In this section, we review some exact approaches that have been
proposed for the hiding of sensitive association rules. Exact approaches are typically
capable of providing superior solutions compared to the heuristic schemes, at a high
computational cost. They achieve this by formulating the sanitization process as a
Search WWH ::




Custom Search