Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

who are asked to provide personal information on Web forms to e-commerce service

providers. The compulsion for doing so may be the (perhaps well-founded) worry

that the requested information may be misused by the service provider to harass the

customer. As a case in point, consider a pharmaceutical company that asks clients to

disclose the diseases they have suffered from in order to investigate the correlations

in their occurrences—for example, “Adult females with malarial infections are also

prone to contract tuberculosis”. The company may be acquiring the data solely for

genuine data mining purposes that would eventually reflect itself in better service to

the client. But, at the same time the client might worry that if her medical records

are either inadvertently or deliberately disclosed, it may adversely affect her future

employment opportunities.

In this section, we study whether customers can be encouraged to provide correct

information by ensuring that the mining process cannot, with any reasonable degree

of certainty, violate their privacy, but at the same time produce sufficiently accurate

mining results. The difficulty in achieving these goals is that privacy and accuracy are

typically contradictory in nature, with the consequence that improving one usually

incurs a cost in the other [ 3 ]. A related issue is the degree of trust that needs to

be placed by the users in third-party intermediaries. And finally, from a practical

viability perspective, the time and resource overheads that are imposed on the data

mining process due to supporting the privacy requirements.

Our study is carried out in the context of extracting association rules from large

historical databases [ 8 ], an extremely popular mining process that identifies inter-

esting correlations between database attributes, such as the one described in the

pharmaceutical example. By the end of Sect. 2, we will show that the state-of-the-art

in input privacy is such that it is indeed possible to simultaneously achieve all the

desirable objectives (i.e., privacy, accuracy, and efficiency) for ARM.

2.1

Problem Framework

In what follows, we describe the framework of the privacy mining problem in the

context of association rules.

Database Model We assume that the original (true) database U consists of N

records, with each record having M categorical attributes. Note that boolean data

is a special case of this class, and further, that continuous-valued attributes can be

converted into categorical attributes by partitioning the domain of the attribute into

fixed length intervals.

The domain of attribute j is denoted by S U , resulting in the domain S U of a

record in U being given by S U

M

j

1 S U . We map the domain S U to the index set

=

={

|

S U |}

I U

, thereby modeling the database as a set of N values from I U .If

we denote the i th record of U as U i , then U

1, ... ,

i = 1 , U i ∈

={

U i }

I U .

To make this concrete, consider a database U with 3 categorical attributes Age ,

Sex and Education having the following category values:

Frequent Pattern Mining

Search WWH ::

Custom Search

Home