Database Reference
In-Depth Information
quences of the original database to identify those in which the sensitive sequence
is a subsequence 2 . For every such sequence of the original database, the algorithm
examines in how many different ways this sequence becomes a subsequence of the
sensitive one. Each “different way” (also called a matching) is counted based on the
position of each element in the sequence that participates to the generation of the
sensitive sequence. As an effect, for each element of the sequence coming from the
original dataset, the algorithm maintains a counter depicting the number of match-
ings in which it is involved. To sanitize the sequence, the algorithm iteratively iden-
tifies the element of the sequence which has the highest counter (i.e., it is involved in
most matchings) and replaces it with an unknown, until the sensitive sequence is no
longer a subsequence of the sanitized one. As a result of this operation, the sensitive
sequence becomes unsupported by the sanitized sequence. In order to enforce the
requested disclosure threshold the algorithm applies this sanitization operation in
the following manner. For each sensitive sequence, all the sequences of the original
dataset are sorted in ascending order based on the number of different matchings that
they have with the sensitive sequence. Then, the algorithm sanitizes the sequences in
this order, until the required disclosure threshold is met in the privacy-aware version
of the original dataset. The authors have developed extensions of this approach for
the handling of temporal constraints, such as min gap, max gap and max window.
2 A sequence S1 is a subsequence of another sequence S2 if it can be obtained by deleting some
elements from S2.
Search WWH ::




Custom Search