Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

while special care is taken to ensure the validity of the transactions in the extended

part.

A two-phase iterative process that improves the functionality of the inline approach

was proposed by Gkoulalas-Divanis and Verykios in [ 23 ]. The process consists of two

phases that are executed in an iterative fashion until either (i) an exact solution of the

given problem instance is found, or (ii) a pre-specified number of phase iterations

(called oscillations ) have taken place. In the first phase, the hiding algorithm

uses the inline approach in an effort to conceal the sensitive knowledge without

side-effects. If it succeeds, then the process terminates. Otherwise, the algorithm

proceeds to the second phase, which implements the dual counterpart of the inline

algorithm. In this phase, the hiding algorithm selectively removes inequalities from

the infeasible CSP, until the CSP becomes feasible, and then solves the CSP to attain

the sanitized dataset. This dataset is bound to suffer from side-effects (due to the

removal of constraints) and the purpose of the second phase is to recover the lost

itemsets by increasing their support and making them frequent again.

3.4

Metrics and Performance Analysis

In this section, we present two categories of measures related to the performance

of an association rule hiding algorithm. The first category consists of measures that

can either be optimized by a hiding scheme in the course of its execution, or be

adopted to allow for a fair comparison among different hiding schemes under a

unified framework. The measures belonging in this category are called internal and

were proposed by Oliveira et al. [ 41 ]. They are classified as either data sharing -based

or pattern sharing -based. The data sharing-based measures quantify the extent of

side-effects regarding sensitive association rules that failed to be hidden, legitimate

rules that were accidentally missed, and artifactual association rules that were created

by the sanitization process. On the other hand, the pattern sharing-based measures

quantify the extent of side-effects regarding non-sensitive association rules that were

lost or sensitive rules that were improperly hidden and can be easily be recovered

through the use of inference channels. Furthermore, we proceed to present another set

of metrics, which measure external parameters such as the behavior of the algorithm

when applied to large datasets, its computational speed, and so on and so forth. The

measures of this category are called external and were proposed by Bertino et al. [ 12 ].

The proposed data-sharing based measures are the following:

(a) Hiding Failure (HF). This measure quantifies the percentage of the sensitive

patterns that remain exposed in the sanitized dataset. It is defined as the fraction

of the restrictive association rules that appear in the sanitized database divided

by the ones that appeared in the original dataset. Formally,

R P ( U )

= |

|

HF

|

R P ( U )

|

Frequent Pattern Mining

Search WWH ::

Custom Search

Home