Database Reference
In-Depth Information
Definition 16.2. (Optimal solution) A solution to the hiding of the sensitive item-
sets is considered as optimal if it has the minimum distance among all the exist-
ing exact solutions and is obtained through the minimum expansion of D O . In that
sense, optimal is a solution that is both minimal (with respect to the distance and the
size of the extension) and exact.
16.2.3 Revision of the Borders
The concept of border revision, introduced by Sun and Yu [66] and covered in detail
in Chapter 9, provides the underlying mechanism for the specification of the values
of the u qm variables to 1s or 0s, in a way that minimizes the impact on D. In order
to maintain high quality on the result database, the hybrid hiding algorithm should
select such item modifications in D X that have minimal impact on the border of the
nonsensitive itemsets.
The first step of the hybrid hiding methodology rests on the identification of the
revised borders for D. The hiding algorithm relies on both the revised positive and
the revised negative borders, denoted as Bd + (F 0 D ) and Bd (F 0 D ), respectively. Af-
ter identifying the revised borders, the hiding process has to perform all the required
minimal adjustments of the transactions in D X to enforce the existence of the new
borderline in the result database D.
Continuing the example of Section 16.1.2, consider the lattice of Figure 9.1, cor-
responding to database D O of Table 16.1, when applying frequent itemset min-
ing at mfreq = 0:3. Near each itemset we depict its support in the original (Figure
9.1(i)) and the sanitized (Figure 9.1(ii)) database. Based on the original hyperplane,
the following borders can be identified for D O : Bd + (F D O ) =fabc; be; acd; aceg
and Bd (F D O ) =ff ; bd; de; abe; bceg. The original borders are presented in Fig-
ure 9.1(i), where the itemsets belonging to the positive border are double under-
lined and the ones belonging to the negative border are single underlined. As one
can notice, in the lattice of this figure all frequent itemsets lie at the left of the
respective border, while their infrequent counterparts are on the right. Given that
S = fe; ae; bcg (shown in squares in Figure 9.1(i)), we have that S min = fe; bcg
(shown in bold in Figure 9.1(i)) and S max = fe; ae; be; bc; ce; abc; aceg. Ideally,
the frequent itemsets in the sanitized database D will be exactly those in F 0 D =
fa; b; c; d; ab; ac; ad; cd; acdg. The revised borderline along with the corresponding
borders for D and scenarios C 1 and C 3 that must hold for an exact solution, are de-
picted in Figure 9.1(ii). Scenario C 2 corresponds to the itemsets lying on the right
of the original border. To enhance the clarity of the figure only a small portion of
the itemsets involved in C 2 are shown in Figure 9.1. The revised borders that per-
tain to an optimal solution are: Bd + (F 0 D ) =fab; acdg (double underlined in Figure
9.1(ii)) and Bd (F 0 D ) =fe; f ; bc; bdg (single underlined in Figure 9.1(ii)). What is
thus needed is a way to enforce the existence of the revised borders in D.
Search WWH ::




Custom Search