Database Reference
In-Depth Information
tion of the original database that maintains the similarity among the various pairs
of attributes. In most of the cases, these methodologies are independent of the clus-
tering algorithm that is used. In the transformed space, the similarity between the
distorted attribute pairs can still maintain the computation of accurate results which
allow for correct clustering of the various transactions. Some interesting approaches
that fall in this category involve the work of Oliveira & Zaïane [56, 57].
Protocol-based approaches, on the other hand, assume a distributed scenario
where many data owners want to share their data for clustering purposes, without
however compromising the privacy of their data by revealing any sensitive knowl-
edge. The algorithms of this category make an assumption regarding the partitioning
of the data among the interested, collaborating parties and are typically the privacy
preserving versions of commonly used clustering algorithms, such as K-means [68].
The proposed protocols control the information that is communicated among the
different collaborating parties and guarantee that no sensitive knowledge can be
learned from the model. Approaches in this category include the work of Jha, et
al. [38] and the work of Jagannathan, et al. [37], among others.
A somewhat different kind of approach that targets on density-based clustering
is presented in the work of Silva & Klusch [19]. The authors propose a kernel-based
distributed clustering algorithm that uses an approximation of density estimation in
an attempt to harden the reconstruction process for the original database. Each site
computes a local density estimate for the data it holds and transmits it to a trusted
third party. In sequel, the trusted party builds a global density estimate and returns
it to the collaborating peers. By making use of this estimate, the sites can locally
execute density-based clustering.
4.3 Sequence Hiding
The hiding of sensitive sequences is one of the most recent and challenging research
directions in privacy preserving data mining, particularly due to the tight relation
that exists between sequential and mobility data 1 . The underlying problem has the
same principles as association rule hiding in the sense that a set of sensitive sequen-
tial patterns need to be hidden from a database of sequences in a way that causes the
least side-effects to their nonsensitive counterparts.
Abul, et al. [3] propose a sequential pattern hiding methodology which assumes
that pertinent to every sensitive sequence is a disclosure threshold that defines the
maximum number of sequences in the sanitized database that are allowed to support
it. The sequence sanitization operation is based on the use of unknowns to mask
selected elements in the sequences of the original dataset. The proposed algorithm
operates as follows. For each sensitive sequence, the algorithm searches all the se-
1 Privacy preserving data mining of user mobility data is a very hot research topic that has been
studied in the context of EU-funded IST projects such as Geographic Privacy-aware Knowledge
Discovery and Delivery — GeoPKDD ( http://www.geopkdd.eu ) and Mobility, Data Mining,
and Privacy — MODAP ( http://www.modap.org ).
Search WWH ::




Custom Search