Database Reference
In-Depth Information
for an efficient retrieval of the data. In the second scan, the algorithms sanitize the
database by selectively removing the least amount of individual items that accom-
modate the hiding of the sensitive knowledge. An interesting novelty of this work is
the fact that the proposed methodology takes into account not only the impact of the
sanitization on hiding the sensitive patterns, but also the impact related to the hiding
of non-sensitive knowledge. Three item restriction-based (MinFIA, MaxFIA, and
IGA) algorithms are proposed that selectively remove items from sensitive transac-
tions. The first algorithm, MinFIA, proceeds as follows. For each restrictive pattern
it identifies the supporting transactions and the item having the smallest support in
the pattern (called victim item ). Then, by using a user-supplied disclosure thresh-
old, it first sorts the identified transactions in ascending order of degree of conflict
and then selects the number of transactions (among them) that need to be sanitized.
Finally, from each selected transaction the algorithm removes the victim item. The
MaxFIA algorithm proceeds exactly as the MinFIA with the only difference of se-
lecting as the victim item the one that has the maximum support in the sensitive
rule. Finally, IGA aims at clustering the restricted patterns into groups that share the
same itemsets. By identifying overlapping clusters, the algorithm proceeds to hide
the corresponding sensitive patterns at once (based on the sensitive itemsets they
share) and consequently reduces the impact on the released dataset.
A more efficient approach than the one in [ 39 ] and the works of [ 15 , 47 , 48 ]was
proposed by Oliveira and Zaïane [ 40 ]. The proposed algorithm, called SWA, is an
efficient, scalable, one-scan heuristic which aims at providing a balance between the
needs for privacy and knowledge discovery in ARH. It achieves to hide multiple rules
in only one pass through the dataset, regardless of its size or the number of sensitive
rules that need to be protected. The algorithm proceeds in five steps that are applied
to every group of K transactions (thus formulating a window of size K ) read from
the original database. Firstly, the non-sensitive transactions are separated from the
sensitive ones and copied directly to the sanitized database. For each sensitive rule,
the item having the highest frequency is selected and the supporting transactions
are identified. Then, a disclosure threshold, potentially different for each sensitive
rule, is used to capture the severity characterizing the release of the rule. Based on
this threshold, SWA computes the number of supporting transactions that need to
be sanitized for each rule and then sorts them in ascending order of size. For each
selected transaction, the corresponding item is removed and then the transaction is
copied to the sanitized dataset. The authors present a set of computational tests to
demonstrate that SWA outperforms state-of-the-art approaches in terms of concealing
all the sensitive rules, while maintaining high data utility of the released dataset.
Amiri [ 10 ] proposes three effective, multiple rule hiding heuristics that outper-
form SWA by offering higher data utility and lower distortion, at the expense of
computational cost. Although similar in the philosophy to the previous approaches,
the proposed schemes do a better job in modelling the overall objective of a rule
hiding algorithm. The first approach, called Aggregate , computes the union of the
supporting transactions for all sensitive itemsets. Among them, the transaction that
supports the most sensitive and the least non-sensitive itemsets is selected and ex-
pelled from the database. The same process is repeated until all the sensitive itemsets
Search WWH ::




Custom Search