Database Reference
In-Depth Information
mining algorithms—indicated the need to consider different data mining approaches
under the prism of preserving information privacy.
The following scenario exemplifies the necessity of applying ARH algorithms
to protect sensitive knowledge. Let us suppose that we, the purchasing directors of
BigMart, a large supermarket chain, are negotiating with Dedtrees Paper Company.
They offer their products with reduced prices, provided that we agree to give them
access to our database of customer purchases. We accept the deal and Dedtrees starts
mining our data. By using an ARM tool, they find that people who purchase skim
milk also purchase Green Paper. Dedtrees now runs a coupon marketing campaign
offering a 50 cents discount on skim milk with every purchase of a Dedtrees product.
The campaign cuts heavily into the sales of Green Paper, which increases its prices,
based on the lower sales. During our next negotiation with Dedtrees, we find out
that with reduced competition they are unwilling to offer to us a low price. Finally,
we start losing business to our competitors, who were able to negotiate a better
deal with Green Paper. In other words, the aforementioned scenario indicates that
BigMart should sanitize competitive information (and other important corporate
secrets of course) before delivering their database to Dedtrees, so that Dedtrees does
not monopolize the paper market.
We should emphasize here that the ARH problem can be considered as a variation
of the well known database inference control [ 19 ] problem in statistical and mul-
tilevel databases. The primary goal, in the database inference control, is to protect
access to sensitive information that can be obtained through non-sensitive data and
inference rules. In ARH, it is not the data but the sensitive rules that create a breach of
privacy. Given a set of sensitive association rules, which are specified by the security
administrator, the task of the association rule hiding algorithms is to sanitize the data
so that the ARM algorithms applied to this data will be (a) incapable of discovering
the sensitive rules under certain parameter settings, and (b) able to mine all the non-
sensitive rules. A recently investigated problem, known as inverse frequent itemset
mining [ 33 ], provides a special solution to the association rule hiding problem even
though it is not targeted to addressing privacy issues per se.
3.1
Terminology and Preliminaries
As stated earlier, ARM is the process involving the discovery of sets of items ( item-
sets ) that frequently co-occur in a database with the goal of producing association
rules that hold for the data [ 5 , 8 ]. The itemset C x
C y that led to the generation
of an association rule C x
C y is known as the generating itemset and consists
of two parts, the Left Hand Side (LHS), which is the part on the left of the arrow of
the rule (here C x ), and the Right Hand Side (RHS), which is the part on the right
of the arrow of the rule (here C y ). An itemset with k items is called k -itemset. In
ARH algorithms we consider that database U is given in the form of transactions,
where each record (also known as transaction ) is associated with a set of items from
a domain
I
. These items, for example, could refer to purchased products; thus a
Search WWH ::




Custom Search