Database Reference
In-Depth Information
14.1 Privacy Model
In Chapter 9, we highlighted the process of border revision and we presented a set
of algorithms which enable the computation of the revised borders that pertain to
an exact hiding solution. The inline methodology that is covered in this chapter
aims to enforce the revised borders when constructing the sanitized version D of
the original database D O . The sanitization approach that is followed is based on the
modification of selected T nm values from one to zero (1!0), thus excluding certain
items from selected transactions of D O . Since no transactions are added of removed
to/from the original database, the sanitized version D of database D O will have the
same number of transactions as the original one. Therefore, it holds thatjDj=jD O j.
Using F D to denote the set of frequent itemsets in the sanitized database D, the goal
that must be accomplished for the hiding of all the sensitive itemsets in D can be
expressed as follows: 8S 2S min : S 2F D , where S min is the set of minimal sensitive
itemsets, formally introduced in Chapter 9.
It is trivial to prove that the aforementioned goal can be easily achieved for any
databaseD O , provided that sufficient distortion is introduced to the database 1 . How-
ever, in reality one would like to hide all sensitive itemsets while minimally affecting
the original database. As stated in Chapter 9, the minimum harm in the hiding of the
sensitive knowledge can be quantified as F 0 D = F D O S max , meaning that in an
optimal hiding scenario, the sanitized database D will contain all frequent itemsets
of D O except from the sensitive ones 2 . By only removing certain items from se-
lected transactions in D O , the inline algorithm ensures that database D will contain
no ghost itemsets, i.e. nonsensitive itemsets that did not appear among the frequent
ones in the original database. However, the exclusion of items from transactions
may lead to nonsensitive frequent itemsets being accidentally lost in the sanitized
database D. Thus, the inline algorithm has to take the appropriate measures to min-
imize the loss of frequent itemsets in D as a side-effect of the sanitization process.
As we presented in Chapter 13, Menon, et al. [47] employ the measure of accu-
racy in the optimization criterion of the produced CSP in order to derive the hiding
solution. Maximizing accuracy is equivalent to minimizing the number of transac-
tions from D O that need to be modified to accommodate the hiding of the sensitive
knowledge. The inline algorithm, on the other hand, takes a more radical approach
to quantifying the harm induced to the original database by penalizing the number
of actual item modifications rather than the number of transactions that are affected
by the hiding process. Indeed, a hiding algorithm may alter less transactions but
to a much higher degree than another one that probably alters more transactions
but to a much lower degree. A basic drawback of the accuracy measure is that it
penalizes all potential modifications of a frequent itemset, say for instance, both
'abcde' !? and 'abcde' ! 'abcd', to the same extent. Based on this mishap, a
1 This property along with a naïve approach for solving the association rule hiding problem, were
briefly discussed in Chapter 2 (Section 2.2.1).
2 We should point out that notation F 0 D refers to the optimal set of frequent itemsets that should
appear in the sanitized database D to facilitate exact knowledge hiding, while F D denotes the
actual set of frequent itemsets that can be mined from D.
Search WWH ::




Custom Search