Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

The third dimension, refers to whether a single sensitive rule or a set of sensitive

rules can be hidden during an iteration of the hiding algorithm. Based on this criterion

we differentiate hiding algorithms into single rule and multiple rule schemes.

The fourth dimension has to do with the nature of the hiding algorithm, which

can be either heuristic or exact . Heuristic algorithms take decisions that aim at

optimizing certain sub-goals in the hiding process, are computationally efficient,

but do not guarantee optimality. The formulation of the ARH problem (presented in

Sect. 3.1) implies that there are two specific sub-goals that need to be attained by

every ARH algorithm. The first sub-goal [ (a) ], which is basically the most important,

is to try to hide as many sensitive rules as possible. The second sub-goal [ (b), (c), (d) ]

is to manage to hide the sensitive rules by minimizing side-effects. Different hiding

algorithms give different priorities to the satisfaction of the sub-goals presented,

producing in this way a list of hiding primitives.

Exact techniques, on the other hand, rely on formulating the ARH problem so that

a solution satisfying all the sub-goals can be found. Of course, there is a high possi-

bility that an exact approach fails to give a solution, and for this reason, some of the

sub-goals need to be relaxed. However, this relaxation process is still part of the exact

approach, which makes it different from the heuristic approaches. Although exact ap-

proaches lead to better solutions than heuristic algorithms, they are computationally

demanding and can be applied only to small and medium-size datasets.

The fifth and final dimension determines whether a hiding algorithm preprocesses

the user-specified sensitive rules so that a minimal set of sensitive rules are given

as input to the hiding technique. The corresponding techniques make use of the

border of the frequent itemsets [ 31 ], which provides a compact representation of the

frequent itemsets mined from a database, to facilitate knowledge hiding. Specifically,

given the set of frequent itemsets F mined from the transactions of database U , the

negative border of F , denoted as B − ( F ), is defined as the set of all infrequent

itemsets mined from U in which all proper subsets appear in F . Similarly, the

positive border of F , denoted as B + ( F ), is defined as the set of all maximally

frequent itemsets appearing in F . The positive and the negative border formulate

collectively the border B ( F ) of frequent itemsets. Formally stated, if

is the set of

all items appearing in U , then B − ( F )

C x ⊆ I

: C x /

∈

∧∀

C y ⊂

C x : C y ∈

}

B + ( F )

B + ( F ).

Border-based hiding techniques compute the border of the frequent itemsets and

modify it appropriately by recomputing it, in such a way, that a minimal set of

sensitive rules joins the newly computed border. The algorithms are subsequently

driven by the negative and positive border for hiding the rules [ 21 , 31 ].

B − ( F )

C x ⊆ I

: C x ∈

∧∀

C y ⊃

C x : C y /

∈

}

, and B ( F )

∪

3.3

Heuristic and Exact ARH Algorithms

Among the different dimensions in the taxonomy of frequent itemset and association

rule hiding algorithms, the prevalent dimension regards the nature of the hiding

algorithms. As stated before, ARH algorithms can be divided into two broad classes,

Frequent Pattern Mining

Search WWH ::

Custom Search

Home