Database Reference
In-Depth Information
port the sensitive itemset, the support of the itemset in D drops below the minimum
support threshold and thus the itemset is hidden. This observation led the authors
of [47] to propose the formulation of the hiding process as a CSP that contains one
constraint for each (sensitive) itemset in S, effectively regulating which of the sup-
porting transactions of the itemset will have to be sanitized to reduce its support the
least below the minimum support threshold. To allow for an optimal solution of the
CSP, the objective criterion that is selected requires the least number of transactions,
among those supporting the sensitive knowledge in S, to be marked for sanitization.
Figure 13.1 presents the formulation of the CSP that is considered in [47]. In this
figure, we use notation D S to capture the transactions of D O that support sensitive
itemsets from S, as well as we adopt the parameters a i j and x i from [47], which are
defined as follows:
( 1;
if transaction T i 2D S supports the j-th itemset inS
a i j =
(13.1)
0;
otherwise
( 1;
if transaction T i 2D S has to be sanitized
x i =
(13.2)
0;
otherwise
As one can observe, the objective function å T n 2D S x i of the CSP requires the
least number of ones for parameter x i , which translates to the minimum number
of transactions in D S that have to be sanitized to conceal the sensitive knowl-
edge. Since this objective penalizes the hiding algorithm for every transaction
that is marked for sanitization, it achieves to maximize the accuracy of the re-
leased database. Moreover, the first constraint of the CSP requires that at least
(sup(I;D O )msup+1) transactions are sanitized for the hiding of each sensitive
itemset in S, which are (obviously) selected among the supporting ones. Last, the
second constraint of the CSP requires that the computed x i 's by the integer program-
ming solver are binary variables, essentially capturing which transactions from D S
will have to be sanitized to effectively conceal the sensitive knowledge.
Given the proposed CSP formulation, an interesting observation is that the data
owner can select to hide each sensitive itemset to a different degree in the sanitized
database. This can be easily accomplished by using a different minimum support
threshold, say msup I , for each sensitive itemset I 2S that replaces the common
msup threshold in the right hand side of the first constraint of the CSP.
13.1.2 CSP Decomposition
For very large databases the CSP formulation presented in Figure 13.1 may contain
too many constraints that have to be simultaneously satisfied and, thus, be tough
to solve. In such cases, it is important to examine if the CSP can be decomposed
Search WWH ::




Custom Search