Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

mining algorithms—indicated the need to consider different data mining approaches

under the prism of preserving information privacy.

The following scenario exemplifies the necessity of applying ARH algorithms

to protect sensitive knowledge. Let us suppose that we, the purchasing directors of

BigMart, a large supermarket chain, are negotiating with Dedtrees Paper Company.

They offer their products with reduced prices, provided that we agree to give them

access to our database of customer purchases. We accept the deal and Dedtrees starts

mining our data. By using an ARM tool, they find that people who purchase skim

milk also purchase Green Paper. Dedtrees now runs a coupon marketing campaign

offering a 50 cents discount on skim milk with every purchase of a Dedtrees product.

The campaign cuts heavily into the sales of Green Paper, which increases its prices,

based on the lower sales. During our next negotiation with Dedtrees, we find out

that with reduced competition they are unwilling to offer to us a low price. Finally,

we start losing business to our competitors, who were able to negotiate a better

deal with Green Paper. In other words, the aforementioned scenario indicates that

BigMart should sanitize competitive information (and other important corporate

secrets of course) before delivering their database to Dedtrees, so that Dedtrees does

not monopolize the paper market.

We should emphasize here that the ARH problem can be considered as a variation

of the well known database inference control [ 19 ] problem in statistical and mul-

tilevel databases. The primary goal, in the database inference control, is to protect

access to sensitive information that can be obtained through non-sensitive data and

inference rules. In ARH, it is not the data but the sensitive rules that create a breach of

privacy. Given a set of sensitive association rules, which are specified by the security

administrator, the task of the association rule hiding algorithms is to sanitize the data

so that the ARM algorithms applied to this data will be (a) incapable of discovering

the sensitive rules under certain parameter settings, and (b) able to mine all the non-

sensitive rules. A recently investigated problem, known as inverse frequent itemset

mining [ 33 ], provides a special solution to the association rule hiding problem even

though it is not targeted to addressing privacy issues per se.

3.1

Terminology and Preliminaries

As stated earlier, ARM is the process involving the discovery of sets of items ( item-

sets ) that frequently co-occur in a database with the goal of producing association

rules that hold for the data [ 5 , 8 ]. The itemset C x

∪

C y that led to the generation

of an association rule C x

C y is known as the generating itemset and consists

of two parts, the Left Hand Side (LHS), which is the part on the left of the arrow of

the rule (here C x ), and the Right Hand Side (RHS), which is the part on the right

of the arrow of the rule (here C y ). An itemset with k items is called k -itemset. In

ARH algorithms we consider that database U is given in the form of transactions,

where each record (also known as transaction ) is associated with a set of items from

a domain

⇒

I

. These items, for example, could refer to purchased products; thus a

Frequent Pattern Mining

Search WWH ::

Custom Search

Home