Introduction - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

vacy regulations requires the incorporation of advanced and sophisticated privacy

preserving methodologies.

1.1 Privacy Preserving Data Mining

Privacy preserving data mining is a relatively new research area in the data mining

community, counting approximately a decade of existence. It investigates the side-

effects of data mining methods that originate from the penetration into the privacy of

individuals and organizations. Since the pioneering work of Agrawal & Srikant [8]

and Lindell & Pinkas [43] in 2000, several approaches have been proposed in the

research literature for the offering of privacy in data mining. The majority of the

proposed approaches can be classified along two principal research directions: (i)

data hiding approaches and (ii) knowledge hiding approaches.

The first direction collects methodologies that investigate how the privacy of raw

data, or information, can be maintained before the course of mining the data. The

approaches of this category aim at the removal of confidential or private information

from the original data prior to its disclosure and operate by applying techniques

such as perturbation, sampling, generalization or suppression, transformation, etc.

to generate a sanitized counterpart of the original dataset. Their ultimate goal is to

enable the data holder receive accurate data mining results when is not provided with

the real data or adhere to specific regulations pertaining to microdata publication

(e.g., as is the case of publishing patient-specific data).

The second direction of approaches involves methodologies that aim to protect

the sensitive data mining results (i.e., the extracted knowledge patterns) rather than

the raw data itself, which were produced by the application of data mining tools

on the original database. This direction of approaches mainly deals with distortion

and blocking techniques that prohibit the leakage of sensitive knowledge patterns

in the disclosed data, as well as with techniques for downgrading the effectiveness

of classifiers in classification tasks, such that the produced classifiers do not reveal

any sensitive knowledge.

1.2 Association Rule Hiding

In this topic, we focus on the knowledge hiding thread of privacy preserving data

mining and study a specific class of approaches which are collectively known

as frequent itemset and association rule hiding approaches. Other classes of ap-

proaches under the knowledge hiding thread include classification rule hiding, clus-

tering model hiding, sequence hiding, and so on and so forth. An overview of these

methodologies can be found in [4, 28, 70, 73]. Some of these methodologies are also

surveyed as part of Chapter 4 of this topic.

Search WWH ::

Custom Search

Home