Database Reference
In-Depth Information
Chapter 16
Hybrid Algorithm
Gkoulalas-Divanis & Verykios in [26] introduce the first exact methodology to
strategically perform sensitive frequent itemset hiding based on a new notion of
hybrid database generation. This approach broadens the regular process of data san-
itization (as introduced in [10] and adopted by the overwhelming majority of re-
searchers [10, 47, 55]) by applying an extension to the original database instead of
either modifying existing transactions (directly or through the application of trans-
formations), or rebuilding the dataset from scratch to accommodate sensitive knowl-
edge hiding. The extended portion of the database contains a set of carefully crafted
transactions that achieve to lower the importance of the sensitive patterns to a degree
that they become uninteresting from the perspective of the data mining algorithm,
while minimally affecting the importance of the nonsensitive ones. The hiding pro-
cess is guided by the need to maximize the data utility of the sanitized database by
introducing the least possible amount of side-effects. The released database, which
consists of the initial part (original database) and the extended part (database ex-
tension), can guarantee the protection of the sensitive knowledge, when mined at
the same or higher support as the one used in the original database. Extending the
original database for sensitive itemset hiding is proven to provide optimal solutions
to an extended set of hiding problems, compared to previous approaches, as well as
to lead to hiding solutions of typically superior quality [26].
In this chapter we provide the essential theoretical background for the under-
standing of the hybrid algorithm and elucidate the workings of this methodology.
Following previous work [23, 27], the approach presented in this chapter is exact
in nature; provided that a hiding solution that causes no side-effects to the sani-
tized database exists, the hybrid algorithm is guaranteed to find it. The remainder of
this chapter is organized as follows. In Section 16.1 we provide the basic steps of
the methodology, while Section 16.2 sheds light on the main issues that pertain to
its successful application. Following that, Section 16.3 presents the various aspects
of the hybrid solution methodology and Section 16.4 introduces a partitioning ap-
proach which substantially improves the scalability of the hybrid algorithm. Finally,
Section 16.5 contains an experimental evaluation of the hybrid algorithm contrast-
Search WWH ::




Custom Search