Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

outcome. To balance the trade-off between privacy and data loss the proposed algo-

rithm incorporates a safety margin that corresponds to the extend of sanitization that

is performed in the dataset. The higher the safety margin the better the protection of

the sensitive rules and the worse the data quality of the resulting dataset.

Border-based Approaches In this section, we review two border-based approaches

for the hiding of sensitive rules. The work of Sun and Yu [ 50 ] was the first to in-

troduce the process of border revision for the hiding of sensitive association rules.

In their work, the authors propose a heuristic approach that uses the notion of the

border (further analyzed in [ 31 ]) of the non-sensitive frequent itemsets to track the

impact of altering transactions in the database. The proposed scheme, first computes

the positive and the negative borders in the lattice of all itemsets and then focuses

on preserving the quality of the computed borders during the hiding process. The

quality of the borders directly affects the quality of the sanitized database that is

produced, which can be maintained by greedily selecting those modifications that

lead to minimal side-effects. In the proposed heuristic, a weight is assigned to each

element of the expected positive border (which is the original positive border after

it has been shaped up with the removal of the sensitive itemsets) in an attempt to

quantify its vulnerability of being affected by item deletion. These weights are dy-

namically computed (during the sanitization process) as a function of the current

support of the corresponding itemsets in the database. To reduce the support of a

sensitive itemset from the negative border, the algorithm calculates the impact of the

possible item deletions by computing the sum of the weights of the positive border

elements that will be affected. Then, it proceeds to delete the candidate item that will

have the minimal impact on the positive border.

Moustakides and Verykios [ 34 ] follow a similar approach to [ 50 ] by proposing

two heuristics that use the revised positive and negative borders, produced by the

removal of the sensitive itemsets and their supersets from the old frequent itemset

lattice. The proposed algorithms try to remove from the database all the sensitive

itemsets that belong to the revised negative border, while maintaining frequent all

the itemsets of the revised positive border. For every item of a sensitive itemset, the

algorithms list the set of positive border itemsets which depend on it. Then, from

among all minimum border itemsets, the one with the highest support is selected

as it is the one with the maximum distance from the border. This itemset, called

the max-min itemset, determines the item through which the hiding of the sensitive

itemset will incur. The proposed algorithms try to modify this item in such a way

that the support of the max-min itemset is minimally affected. When hiding multiple

itemsets, the algorithms perform the sanitization in a one-by-one fashion, starting

from the itemsets that have lower supports. Finally, the second algorithm improves

the first one and, through experimental evaluation, is shown to provide better hiding

solutions than [ 50 ], in the majority of the tested settings.

Exact Approaches In this section, we review some exact approaches that have been

proposed for the hiding of sensitive association rules. Exact approaches are typically

capable of providing superior solutions compared to the heuristic schemes, at a high

computational cost. They achieve this by formulating the sanitization process as a

Frequent Pattern Mining

Search WWH ::

Custom Search

Home