Other Knowledge Hiding Methodologies - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

tion of the original database that maintains the similarity among the various pairs

of attributes. In most of the cases, these methodologies are independent of the clus-

tering algorithm that is used. In the transformed space, the similarity between the

distorted attribute pairs can still maintain the computation of accurate results which

allow for correct clustering of the various transactions. Some interesting approaches

that fall in this category involve the work of Oliveira & Zaïane [56, 57].

Protocol-based approaches, on the other hand, assume a distributed scenario

where many data owners want to share their data for clustering purposes, without

however compromising the privacy of their data by revealing any sensitive knowl-

edge. The algorithms of this category make an assumption regarding the partitioning

of the data among the interested, collaborating parties and are typically the privacy

preserving versions of commonly used clustering algorithms, such as K-means [68].

The proposed protocols control the information that is communicated among the

different collaborating parties and guarantee that no sensitive knowledge can be

learned from the model. Approaches in this category include the work of Jha, et

al. [38] and the work of Jagannathan, et al. [37], among others.

A somewhat different kind of approach that targets on density-based clustering

is presented in the work of Silva & Klusch [19]. The authors propose a kernel-based

distributed clustering algorithm that uses an approximation of density estimation in

an attempt to harden the reconstruction process for the original database. Each site

computes a local density estimate for the data it holds and transmits it to a trusted

third party. In sequel, the trusted party builds a global density estimate and returns

it to the collaborating peers. By making use of this estimate, the sites can locally

execute density-based clustering.

4.3 Sequence Hiding

The hiding of sensitive sequences is one of the most recent and challenging research

directions in privacy preserving data mining, particularly due to the tight relation

that exists between sequential and mobility data 1 . The underlying problem has the

same principles as association rule hiding in the sense that a set of sensitive sequen-

tial patterns need to be hidden from a database of sequences in a way that causes the

least side-effects to their nonsensitive counterparts.

Abul, et al. [3] propose a sequential pattern hiding methodology which assumes

that pertinent to every sensitive sequence is a disclosure threshold that defines the

maximum number of sequences in the sanitized database that are allowed to support

it. The sequence sanitization operation is based on the use of unknowns to mask

selected elements in the sequences of the original dataset. The proposed algorithm

operates as follows. For each sensitive sequence, the algorithm searches all the se-

1 Privacy preserving data mining of user mobility data is a very hot research topic that has been

studied in the context of EU-funded IST projects such as Geographic Privacy-aware Knowledge

Discovery and Delivery — GeoPKDD ( http://www.geopkdd.eu ) and Mobility, Data Mining,

and Privacy — MODAP ( http://www.modap.org ).

Search WWH ::

Custom Search

Home