Database Reference
In-Depth Information
hiding solution in a database is to select one item from the generating itemset of
each sensitive rule and delete it from all transactions of the database.
2.2.2 Problem Statement
Having presented the goals of association rule hiding methodologies, we now pro-
ceed to present the problem statement. Association rule hiding has been widely
researched along two principal directions (henceforth referred as variants). The first
variant involves approaches that aim at hiding specific association rules among those
mined from the original database. The second variant, on the other hand, collects
methodologies that aim at hiding specific frequent itemsets from those found when
applying frequent itemset mining to the original database. The two variants of the
problem are very similar in nature. Indeed, concealing the sensitive association rules
by hiding their generating itemsets is a common strategy that is adopted by the ma-
jority of researchers. By ensuring that the itemsets that lead to the generation of a
sensitive rule become insignificant in the disclosed database, the data owner can be
certain that his or her sensitive knowledge is adequately protected from untrusted
third parties. In what follows, we lay out the formal statement for each variant of
the problem by introducing the problem statement both in the context of association
rule mining and that of frequent itemset mining.
Variant 1: Hiding sensitive itemsets
We assume that we are provided with a database D O , consisting of N transactions,
and a threshold mfreq set by the owner of the data. After performing frequent itemset
mining in D O with mfreq, we yield a set of frequent patterns, denoted as F D O ,
among which a subset S contains patterns which are considered to be sensitive from
the owner's perspective.
Given the set of sensitive itemsets S, the goal of frequent itemset hiding method-
ologies is to construct a new, sanitized database D from D O , which achieves to
protect the sensitive itemsets S from disclosure, while minimally affecting the non-
sensitive itemsets existing in F D O (i.e., the itemsets in F D O S). The hiding of a
sensitive itemset corresponds to a lowering of its statistical significance, depicted in
terms of support, in the resulting database. To hide a sensitive itemset, the privacy
preserving algorithm has to modify the original database D O in such a way that
when the sanitized database D is mined at the same (or a higher) level of support,
the frequent itemsets that are discovered are all nonsensitive.
Search WWH ::




Custom Search