Introduction - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

1.3 Research Challenges

The association rule hiding problem can be considered as a variation of the well

established database inference control problem [21] in statistical and multilevel

databases. The primary goal in database inference control is to block access to sensi-

tive information that can be obtained through nonsensitive data and inference rules.

In association rule hiding, we consider that it is not the data itself but rather the

sensitive association rules that create a breach to privacy. Given a set of associa-

tion rules, which are mined from a specific data collection and are considered to be

sensitive by an application specialist (e.g., the data owner), the task of association

rule hiding is to properly modify (or as is usually called sanitize 2 ) the original data

so that any association rule mining algorithms that may be applied to the sanitized

version of the data (i) will be incapable to uncover the sensitive rules under certain

parameter settings, and (ii) will be able to mine all the nonsensitive rules that ap-

peared in the original dataset (under the same or higher parameter settings) and no

other rules. The challenge that arises in the context of association rule hiding can

thus be properly stated as follows:

How can we modify (sanitize) the transactions of a database in a way that all

the nonsensitive association rules that are found when mining this database

can still be mined from its sanitized counterpart (under certain parameter set-

tings), while, at the same time, all the sensitive rules are guarded against dis-

closure and no other (originally nonexistent) rules can be mined?

Association rule hiding algorithms are especially designed to provide a solution

to this challenging problem. They accomplish this by introducing a small distortion

to the transactions of the original database in a way that they block the production

of the sensitive association rules in its sanitized counterpart, while still allowing the

mining of the nonsensitive knowledge. What differentiates the quality of one asso-

ciation rule hiding methodology from that of another is the actual distortion that is

caused to the original database, as a result of the hiding process. Ideally, the hiding

process should be accomplished in such a way that the nonsensitive knowledge re-

mains, to the highest possible degree, intact. Another very interesting problem has

been investigated recently, which even though it is not targeted to addressing privacy

issues per se, it does give a special solution to the association rule hiding problem.

The problem is known as inverse frequent itemset mining [48].

2 A dataset is said to be sanitized when it appropriately protects the sensitive knowledge from being

mined, under certain parameter settings. Similarly, a transaction of a dataset is sanitized when it

no longer supports any sensitive itemset or rule. Last, an item is called sanitized when it is altered

in a given transaction to accommodate the hiding of the sensitive knowledge.

Search WWH ::

Custom Search

Home