Database Reference
In-Depth Information
“market-baskets”. The distortion technique was simply to flip each 0 or 1 bit with
a parametrized probability p , or to retain as is with the complementary probability
1
p , and the privacy metric used was average privacy. Through a theoretical and
empirical analysis, it was shown that the p parameter could be carefully tuned to
simultaneously achieve acceptable average privacy and good accuracy.
However, it was also found that mining the distorted database could be orders of
magnitude more time-consuming as compared to mining the original database. This
issue was addressed in a followup work [ 9 ], which showed that by generalizing the
distortion process to perform symbol-specific distortion (i.e., different flipping prob-
abilities for different values), appropriately chooosing these distortion parameters,
and applying a variety of set-theoretic optimizations in the reconstruction process,
runtime efficiencies that are well within an order of magnitude of undistorted mining
can be achieved.
Cut-and-Paste Operator The notion of a privacy breach was introduced in [ 17 ]as
follows: The presence of an itemset I in the randomized transaction causes a privacy
breach of level ρ if it is possible to infer, for some transaction in the true database,
that the probability of some item i occuring in it exceeds ρ .
With regard to this worst-case privacy metric, a set of randomizing privacy
operators were presented and analyzed in [ 17 ]. The starting point was Uniform
Randomization , where each existing item in the true transaction is, with probability
p , replaced with a new item not present in the original transaction. (Note that this
means that the number of items in the randomized transaction is always equal to the
number in the original transaction, and is therefore different from MASK where the
number of items in the randomized transaction is usually significantly more than its
source, since the flipping is done on both the 1's and the 0's in the transaction bit
vector.) It was then pointed out that a basic deficiency of the uniform randomization
approach is that while it might, with a suitable choice of p , be capable of providing
acceptable average privacy, its worst case privacy could be significantly weaker.
To address this issue, an alternative select-a-size (SaS) randomization oper-
ator was proposed, which is composed of the following steps, employed on a
per-transaction basis:
Step 1: For customer transaction t i of length m , a random integer j from [1, m ]is
first chosen with probability p m [ j ].
Step 2: Then, j items are uniformly and randomly selected from the true transaction
and inserted into the randomized transaction.
Step 3: Finally, a uniformly and randomly chosen fraction ρ m of the remaining items
in the database that are not present in the true transaction (i.e., C
items
in t i ), are inserted into the randomized transaction.
In short, the final randomized transaction is composed of a subset of true items from
the original transaction and additional false items from the complementary set of
items in the database.
A variant of the SaS operator, studied in detail in [ 17 ], is the cut-and-paste (C&P)
operator. Here, an additional parameter is a cutoff integer, K m , with the integer j
Search WWH ::




Custom Search