Hybrid Algorithm - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

Chapter 16

Hybrid Algorithm

Gkoulalas-Divanis & Verykios in [26] introduce the first exact methodology to

strategically perform sensitive frequent itemset hiding based on a new notion of

hybrid database generation. This approach broadens the regular process of data san-

itization (as introduced in [10] and adopted by the overwhelming majority of re-

searchers [10, 47, 55]) by applying an extension to the original database instead of

either modifying existing transactions (directly or through the application of trans-

formations), or rebuilding the dataset from scratch to accommodate sensitive knowl-

edge hiding. The extended portion of the database contains a set of carefully crafted

transactions that achieve to lower the importance of the sensitive patterns to a degree

that they become uninteresting from the perspective of the data mining algorithm,

while minimally affecting the importance of the nonsensitive ones. The hiding pro-

cess is guided by the need to maximize the data utility of the sanitized database by

introducing the least possible amount of side-effects. The released database, which

consists of the initial part (original database) and the extended part (database ex-

tension), can guarantee the protection of the sensitive knowledge, when mined at

the same or higher support as the one used in the original database. Extending the

original database for sensitive itemset hiding is proven to provide optimal solutions

to an extended set of hiding problems, compared to previous approaches, as well as

to lead to hiding solutions of typically superior quality [26].

In this chapter we provide the essential theoretical background for the under-

standing of the hybrid algorithm and elucidate the workings of this methodology.

Following previous work [23, 27], the approach presented in this chapter is exact

in nature; provided that a hiding solution that causes no side-effects to the sani-

tized database exists, the hybrid algorithm is guaranteed to find it. The remainder of

this chapter is organized as follows. In Section 16.1 we provide the basic steps of

the methodology, while Section 16.2 sheds light on the main issues that pertain to

its successful application. Following that, Section 16.3 presents the various aspects

of the hybrid solution methodology and Section 16.4 introduces a partitioning ap-

proach which substantially improves the scalability of the hybrid algorithm. Finally,

Section 16.5 contains an experimental evaluation of the hybrid algorithm contrast-

Search WWH ::

Custom Search

Home