Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

namely heuristic approaches and exact approaches. In this section, we review some

of the most popular approaches that belong to each class. Furthermore, we devote

a section to discuss border-based approaches, an important category of heuristic

approaches that has also influenced the design of exact hiding algorithms. Due to

their importance, in what follows we refer to border-based approaches as the third

class of ARH algorithms.

Heuristic approaches involve efficient, fast algorithms that selectively sanitize a

set of transactions from the database to hide the sensitive knowledge. Due to their

efficiency and scalability, the heuristic approaches have been the focus of attention

for the vast majority of researchers in the knowledge hiding field. However, there are

several circumstances in which they suffer from undesirable side-effects that lead

them to poor solutions.

Border-based approaches consider the task of sensitive rule hiding through modi-

fication of the original borders in the lattice of the frequent and the infrequent patterns

in the dataset. In these schemes, the sensitive knowledge is hidden by enforcing the

revised borders (which accommodate the hiding of the sensitive itemsets) in the san-

itized database. The algorithms in this class differ both in the borders that they track

and use for the hiding strategy, as well as in the methodology that they follow to

enforce the revised borders in the sanitized dataset.

Finally, exact approaches contain non-heuristic algorithms which conceive the

hiding process as a constraint satisfaction problem that they solve by using integer

or linear programming. The main difference of these approaches, compared to the

previous ones, is the fact that the sanitization process guarantees optimality in the

hiding solution, provided that an optimal solution exists. On the other hand, these

approaches are usually several orders of magnitude slower than the heuristic ones,

particularly due to the runtime of the integer/linear programming solver. For this,

they are applicable only in small and medium-size datasets.

Heuristic Approaches In this section, we review support-based and confidence-

based heuristic approaches, which are based on either distortion or blocking of the

original values. Between these two categories of approaches, the distortion-based

are the ones commonly adopted by the overwhelming majority of researchers.

Support-based and Confidence-based Distortion Schemes Atallah et al. [ 11 ]

were the first to propose an algorithm for the hiding of sensitive association rules

through the reduction in the support of their generating itemsets. The authors pro-

pose the construction of a lattice-like graph in the database. Through this graph,

the hiding of a large itemset, related to the existence of a sensitive rule, is achieved

by a greedy iterative traversal of its immediate subsets, selection of the subset that

has the maximum support among all candidates (therefore is less probable to be

hidden) and setting of this itemset as the new candidate to be hidden. By iteratively

following these steps, the algorithm identifies the 1-itemset ancestor of the initial

sensitive itemset, having the highest support. Then, by identifying the supporting

transactions for both the initial candidate and the currently identified 1-itemset, the

algorithm removes the 1-itemset from the supporting transaction which affects the

least number of 2-itemsets. In sequel, the algorithm propagates the results of this

Frequent Pattern Mining

Search WWH ::

Custom Search

Home