Quantifying the Privacy of Exact Hiding Algorithms - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

Layer 3 This layer collects the rest of the infrequent itemsets, starting from the

one having the maximum support just below mSS and ending at the infrequent

itemset with the minimum support (mSI), inclusive. This layer is assumed to

contain a total of r itemsets.

Given the layered partitioning of the itemsets in D with respect to their support

values, the quality of a hiding algorithm depends on the position of the various in-

frequent itemsets in Layers 1, 2 and 3. Specifically, let x denote the distance (from

msup) below the borderline where an adversary tries to locate the sensitive knowl-

edge (e.g., by mining database D using support threshold msup - x). Then, estimator

E provides the mean probability of sensitive knowledge disclosure and is defined

in [25] as follows:

8

<

0

x 2 [0 : : :y]

S msupx

y+MSSmSS+1

y+s msupx

y+MSSmSS+1

E =

x 2 (y: : : (y + MSS - mSS + 1)]

(18.1)

:

S

y+s+r msupx

msupmSI+1

x 2 ((MSS - mSS + 1) : : : (msup - mSI + 1)]

By computing E for the sanitized database D, the owner of D O can gain in-depth

understanding regarding the degree of protection that is offered on the sensitive

knowledge in D. Furthermore, he or she may decide on how much lower (with

respect to the support) should the sensitive itemsets be located in D, such that they

are adequately covered up. As a result, a hiding methodology can be applied to the

original databaseD O to produce a sanitized versionD that meets the newly imposed

privacy requirements. Given the presented exact approaches to sensitive knowledge

hiding, such a methodology can be implemented in two steps, as follows:

1. The database owner uses the probability estimator E to compute the value of x

that guarantees maximum safety of the sensitive knowledge.

2. An exact knowledge hiding approach is selected and extra constraints are added

to the formulated CSP to ensure that the support of the sensitive knowledge in

the generated sanitized database will become at most x.

maximize å u nm 2U u nm

( å T n 2D X Õ i m 2X u nm < msupx;8X 2S min

å T n 2D R Õ i m 2R u nm msup;8R 2V

subject to

Fig. 18.2: The modified CSP for the inline algorithm that guarantees increased safety

for the hiding of sensitive knowledge.

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home