Database Reference
In-Depth Information
record of U may capture the items that were purchased together by an individual
from a supermarket (e.g., u 1 =
{bread, milk, sugar}). A similar representation that is
usually adopted by ARH algorithms is that of a boolean matrix, where each column
corresponds to an item from the domain of items
I
and each row is a transaction.
In this representation, a transaction of U has length
and has 1's in items that are
associated with it (e.g., purchased items) and 0's in the rest of the items.
Knowledge hiding, in the context of ARM, aims at sanitizing (transforming) the
original dataset in a way that the following goals are accomplished to the largest
possible extent:
| I |
a) Sensitive rules are concealed. No rule that is considered as sensitive from the
data owner's perspective, can be revealed from the sanitized dataset, when the
dataset is mined at pre-specified thresholds of confidence and support (or at any
value higher than these thresholds).
b) Frequent non-sensitive rules are preserved. All the non-sensitive frequent rules
can be successfully mined from the sanitized database at pre-specified thresholds
of confidence and support.
c) Ghost rules are not generated. No rule that was not mined from the original
dataset as frequent can be discovered from the sanitized database, when mining
this database at pre-specified thresholds of confidence and support.
d) Dataset distortion is minimum. The sanitized dataset is “as similar as possible”
to the original dataset, i.e., the number of data items that are affected by the hiding
process is kept minimum.
The first goal requires sensitive rules to disappear. The second goal simply states
that there should be no lost rules in the sanitized dataset. The third goal says that no
false rules should be produced as a side-effect of the sanitization process. The fourth
goal requires that the hiding process incurs minimal distortion to the original dataset.
Generally speaking, in the typical case hiding scenario, the sanitization process has
to be accomplished in a way that minimally affects the original dataset , preserves the
general patterns and trends , and successfully conceals all the sensitive knowledge .
3.2
Taxonomy of ARH Algorithms
In this section, we present a taxonomy of frequent itemset and association rule hiding
algorithms. To classify the various algorithms, we use a set of orthogonal dimensions.
As a first dimension, we consider whether the hiding algorithm uses the support or
the confidence of the rule to drive the hiding process. In this way we separate the
hiding algorithms into support -based and confidence -based.
The second dimension in the classification is related to the modification in the raw
data that is caused by the hiding algorithm. The two forms of modification comprise
the distortion and the blocking of the original values. Distortion is the process of
replacing 1's by 0's and 0's by 1's, while blocking refers to replacing original values
by question marks (unknowns) to confuse adversaries about the actual value.
Search WWH ::




Custom Search