Database Reference
In-Depth Information
Age
Child, Adult, Senior
Sex
Male, Female
Education
Elementary, Graduate
3, S U ={Child, Adult,
Senior}, S U ={Male,
For this schema, M
=
Female},
S U ={Elementary, Graduate}, S U
S U
× S U
× S U ,
=
| S U |=
12. The domain S U
is indexed by the index set I U
={
1, ... ,12
}
, and hence the set of records
U
U
Child
Male
Elementary
1
2
4
9
Child
Male
Graduate
maps
to
Child
Female
Graduate
Senior
Male
Elementary
Mining Objective The goal of the data-miner is to compute association rules on
the above database. Denoting the set of attributes in database U by C , an association
rule is a (statistical) implication of the form C x
C y , where C x , C y
C and
C x
C y is said to have a support (or frequency) factor s iff
at least s % of the transactions in U satisfy C x
C y =
φ . A rule C x
C y is satisfied in
U with a confidence factor c iff at least c % of the transactions in U that satisfy C x
also satisfy C y . Both support and confidence are fractions in the interval [0,1]. The
support is a measure of statistical significance, whereas confidence is a measure of
the strength of the rule.
A rule is said to be “interesting” if its support and confidence are greater than
user-defined thresholds sup min and con min , respectively, and the objective of the
mining process is to find all such interesting rules. It has been shown in [ 8 ] that
achieving this goal is effectively equivalent to generating all subsets of C that have
support greater than sup min - these subsets are called frequent itemsets. Therefore,
the mining objective is, in essence, to efficiently discover all frequent itemsets that
are present in the database.
C y . A rule C x
Privacy Mechanisms We now move on to considering the various mechanisms
through which privacy of the user data could be provided. One approach to address
this problem is for the service providers to assure the users that the databases obtained
from their information would be anonymized (through the variety of techniques pro-
posed in the statistical database literature [ 2 , 49 ]), before being supplied to the data
miners. For example, the swapping of attribute-values between different customer
records, as proposed in [ 16 ], can be used to conceal the true value of the correspond-
ing attribute for each customer. Such a privacy environment in which customers
depend on the service provider to guarantee privacy provisioning, is referred to in
the literature as a “B2B (business-to-business)” environment.
However, in today's world, most users are (perhaps justifiably) cynical about such
assurances, and it is therefore imperative to demonstrably provide privacy at the
point of data collection itself, that is, at the user site . This is referred to as the “B2C
(business-to-customer)” privacy environment [ 57 ]. Note that in this environment,
Search WWH ::




Custom Search