Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

for an efficient retrieval of the data. In the second scan, the algorithms sanitize the

database by selectively removing the least amount of individual items that accom-

modate the hiding of the sensitive knowledge. An interesting novelty of this work is

the fact that the proposed methodology takes into account not only the impact of the

sanitization on hiding the sensitive patterns, but also the impact related to the hiding

of non-sensitive knowledge. Three item restriction-based (MinFIA, MaxFIA, and

IGA) algorithms are proposed that selectively remove items from sensitive transac-

tions. The first algorithm, MinFIA, proceeds as follows. For each restrictive pattern

it identifies the supporting transactions and the item having the smallest support in

the pattern (called victim item ). Then, by using a user-supplied disclosure thresh-

old, it first sorts the identified transactions in ascending order of degree of conflict

and then selects the number of transactions (among them) that need to be sanitized.

Finally, from each selected transaction the algorithm removes the victim item. The

MaxFIA algorithm proceeds exactly as the MinFIA with the only difference of se-

lecting as the victim item the one that has the maximum support in the sensitive

rule. Finally, IGA aims at clustering the restricted patterns into groups that share the

same itemsets. By identifying overlapping clusters, the algorithm proceeds to hide

the corresponding sensitive patterns at once (based on the sensitive itemsets they

share) and consequently reduces the impact on the released dataset.

A more efficient approach than the one in [ 39 ] and the works of [ 15 , 47 , 48 ]was

proposed by Oliveira and Zaïane [ 40 ]. The proposed algorithm, called SWA, is an

efficient, scalable, one-scan heuristic which aims at providing a balance between the

needs for privacy and knowledge discovery in ARH. It achieves to hide multiple rules

in only one pass through the dataset, regardless of its size or the number of sensitive

rules that need to be protected. The algorithm proceeds in five steps that are applied

to every group of K transactions (thus formulating a window of size K ) read from

the original database. Firstly, the non-sensitive transactions are separated from the

sensitive ones and copied directly to the sanitized database. For each sensitive rule,

the item having the highest frequency is selected and the supporting transactions

are identified. Then, a disclosure threshold, potentially different for each sensitive

rule, is used to capture the severity characterizing the release of the rule. Based on

this threshold, SWA computes the number of supporting transactions that need to

be sanitized for each rule and then sorts them in ascending order of size. For each

selected transaction, the corresponding item is removed and then the transaction is

copied to the sanitized dataset. The authors present a set of computational tests to

demonstrate that SWA outperforms state-of-the-art approaches in terms of concealing

all the sensitive rules, while maintaining high data utility of the released dataset.

Amiri [ 10 ] proposes three effective, multiple rule hiding heuristics that outper-

form SWA by offering higher data utility and lower distortion, at the expense of

computational cost. Although similar in the philosophy to the previous approaches,

the proposed schemes do a better job in modelling the overall objective of a rule

hiding algorithm. The first approach, called Aggregate , computes the union of the

supporting transactions for all sensitive itemsets. Among them, the transaction that

supports the most sensitive and the least non-sensitive itemsets is selected and ex-

pelled from the database. The same process is repeated until all the sensitive itemsets

Frequent Pattern Mining

Search WWH ::

Custom Search

Home