Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

“market-baskets”. The distortion technique was simply to flip each 0 or 1 bit with

a parametrized probability p , or to retain as is with the complementary probability

1

− p , and the privacy metric used was average privacy. Through a theoretical and

empirical analysis, it was shown that the p parameter could be carefully tuned to

simultaneously achieve acceptable average privacy and good accuracy.

However, it was also found that mining the distorted database could be orders of

magnitude more time-consuming as compared to mining the original database. This

issue was addressed in a followup work [ 9 ], which showed that by generalizing the

distortion process to perform symbol-specific distortion (i.e., different flipping prob-

abilities for different values), appropriately chooosing these distortion parameters,

and applying a variety of set-theoretic optimizations in the reconstruction process,

runtime efficiencies that are well within an order of magnitude of undistorted mining

can be achieved.

Cut-and-Paste Operator The notion of a privacy breach was introduced in [ 17 ]as

follows: The presence of an itemset I in the randomized transaction causes a privacy

breach of level ρ if it is possible to infer, for some transaction in the true database,

that the probability of some item i occuring in it exceeds ρ .

With regard to this worst-case privacy metric, a set of randomizing privacy

operators were presented and analyzed in [ 17 ]. The starting point was Uniform

Randomization , where each existing item in the true transaction is, with probability

p , replaced with a new item not present in the original transaction. (Note that this

means that the number of items in the randomized transaction is always equal to the

number in the original transaction, and is therefore different from MASK where the

number of items in the randomized transaction is usually significantly more than its

source, since the flipping is done on both the 1's and the 0's in the transaction bit

vector.) It was then pointed out that a basic deficiency of the uniform randomization

approach is that while it might, with a suitable choice of p , be capable of providing

acceptable average privacy, its worst case privacy could be significantly weaker.

To address this issue, an alternative select-a-size (SaS) randomization oper-

ator was proposed, which is composed of the following steps, employed on a

per-transaction basis:

Step 1: For customer transaction t i of length m , a random integer j from [1, m ]is

first chosen with probability p m [ j ].

Step 2: Then, j items are uniformly and randomly selected from the true transaction

and inserted into the randomized transaction.

Step 3: Finally, a uniformly and randomly chosen fraction ρ m of the remaining items

in the database that are not present in the true transaction (i.e., C

−

items

in t i ), are inserted into the randomized transaction.

In short, the final randomized transaction is composed of a subset of true items from

the original transaction and additional false items from the complementary set of

items in the database.

A variant of the SaS operator, studied in detail in [ 17 ], is the cut-and-paste (C&P)

operator. Here, an additional parameter is a cutoff integer, K m , with the integer j

Frequent Pattern Mining

Search WWH ::

Custom Search

Home