Other Knowledge Hiding Methodologies - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

quences of the original database to identify those in which the sensitive sequence

is a subsequence 2 . For every such sequence of the original database, the algorithm

examines in how many different ways this sequence becomes a subsequence of the

sensitive one. Each “different way” (also called a matching) is counted based on the

position of each element in the sequence that participates to the generation of the

sensitive sequence. As an effect, for each element of the sequence coming from the

original dataset, the algorithm maintains a counter depicting the number of match-

ings in which it is involved. To sanitize the sequence, the algorithm iteratively iden-

tifies the element of the sequence which has the highest counter (i.e., it is involved in

most matchings) and replaces it with an unknown, until the sensitive sequence is no

longer a subsequence of the sanitized one. As a result of this operation, the sensitive

sequence becomes unsupported by the sanitized sequence. In order to enforce the

requested disclosure threshold the algorithm applies this sanitization operation in

the following manner. For each sensitive sequence, all the sequences of the original

dataset are sorted in ascending order based on the number of different matchings that

they have with the sensitive sequence. Then, the algorithm sanitizes the sequences in

this order, until the required disclosure threshold is met in the privacy-aware version

of the original dataset. The authors have developed extensions of this approach for

the handling of temporal constraints, such as min gap, max gap and max window.

2 A sequence S1 is a subsequence of another sequence S2 if it can be obtained by deleting some

elements from S2.

Search WWH ::

Custom Search

Home