Database Reference
In-Depth Information
where R P ( U ) corresponds to the sensitive rules discovered in the sanitized
dataset U , R P ( U ) to the sensitive rules appearing in the original dataset U and
| X |
is the size of set X . Ideally, the hiding failure should be 0 %.
(b) Misses Cost (MC). This measure quantifies the percentage of the non-restrictive
patterns that are hidden as a side-effect of the sanitization process. It is computed
as
= | R P ( U )
|−| R P ( U )
|
MC
| R P ( U )
|
where R P ( U ) is the set of all non-sensitive rules in the original database U and
R P ( U ) is the set of all non-sensitive rules in the sanitized database U . As one
can notice, there exists a compromise between the misses cost and the hiding
failure, since the more sensitive association rules one needs to hide, the more
legitimate association rules one is expected to miss.
(c) Artifactual Patterns (AF). This measure quantifies the percentage of the
discovered patterns that are artifacts. AF is computed as follows:
P |−|
P |
= |
P
AP
|
P |
where P is the set of association rules discovered in the original database U and
P is the set of association rules discovered in U .
(d) Dissimilarity (Diss). The measure of dissimilarity quantifies the difference be-
tween the original and the sanitized datasets by comparing their histograms,
where the horizontal axis contains the items in the dataset and the vertical axis
corresponds to their frequencies. It is calculated as follows:
n
1
i = 1 f U ( i ) ×
Diss( U , U )
=
[ f U ( i )
f U ( i )]
i = 1
where f X ( i ) represents the frequency of the i
th item in the dataset X , and n is
the number of distinct items in the original dataset D .
The proposed pattern-sharing based metrics are the following:
(a) Side-Effect Factor (SEF). Similarly to the measure of misses cost, the side-
effect factor is used to quantify the amount of non-sensitive association rules
that are removed as an effect of the sanitization process. It is defined as follows:
P |+|
= |
P
|−
(
|
R P ( U )
|
)
SEF
|
P
|−|
R P |
(b) Recovery Factor (RF). This measure expresses the possibility of an adversary
to recover a sensitive rule based on the non-sensitive ones. The recovery factor
of a pattern takes into account the existence of its subsets. If all the subsets of a
sensitive rule can be recovered from the sanitized dataset, then the recovery of
 
Search WWH ::




Custom Search