Database Reference
In-Depth Information
novel global measure of quality, called distance, is proposed in [23], which aims to
capture the proximity between the original and the sanitized database. Distance is
formally defined as follows:
å
n2[1;N];m2[1;M]
å
n2[1;N];m2[1;M]
T nm
=jfT nm 6= T nm gj (14.1)
dist(D O ;D) =
T nm
where T nm refers to the original database D O and T nm to its sanitized version D, N
corresponds to the number of transactions in D O ;D, and M to the number of items.
Due to the fact that the inline algorithm supports only item deletions from selected
transactions, minimizing the outcome of the above formula is exactly the same as
maximizing the number of ones left in D. Thus, the above formula can be restated
as follows:
!
å
n2[1;N];m2[1;M]
T nm
minfdist(D O ;D)g= max
(14.2)
Formula (14.2) captures the optimization criterion that is used by the inline al-
gorithm for the formulation of the CSP, discussed in detail in the following section.
The distance metric allows the selection of the optimal hiding solution, among the
exact ones (in the case that many exact hiding solutions exist for the problem at
hand), or the approximation of the optimal solution to the best possible extent (in
the case that an exact hiding solution cannot be found). Since only the support of the
minimal sensitive itemsets, i.e. the itemsets in S min , has to be reduced for the hiding
of the sensitive knowledge, the T nm values in D O that will be zeroed out in D will
have to correspond to items appearing in transactions that support itemsets in S min .
Having defined the optimization criterion that has to be minimized, the next target
is to identify the set of itemsets in which this criterion will be applied. Solving the
entire problem involving alljÃ(I)j1 = 2 M 1 itemsets in the lattice ofD O is well
known to be NP-hard (c.f. [10, 47]). Therefore, the inline algorithm has to apply the
optimization criterion to only a subset of the transactions of the original database.
Prior to discussing the selection process that is adopted by the inline algorithm, we
present the two possible constraints that can hold for a frequent itemset I in D.
An itemset I from D O will continue to be frequent in D if and only if
N
n=1 i m 2I T nm msup
sup(I;D) msup )
(14.3)
and will be infrequent otherwise, i.e. when
N
n=1 i m 2I T nm < msup
sup(I;D) < msup )
(14.4)
Based on the above inequalities, one can observe that any exact solution will
necessarily satisfy them for the entire problem (i.e., for all the itemsets in the lattice
of D). Specifically, for all frequent itemsets appearing in F 0 D , these itemsets should
Search WWH ::




Custom Search