Other Knowledge Hiding Methodologies - Association Rule Hiding for Data Mining

Database Reference

In-Depth Information

operates by identifying the set of attributes that influence the existence of each sen-

sitive rule the most and then removing them from those supporting transactions that

affect the nonsensitive rules the least.

Chen & Liu [16] present a random rotation perturbation technique to preserve

the multidimensional geometric characteristics of the original database with respect

to task-specific information. As an effect, in the sanitized database the sensitive

knowledge is adequately protected against disclosure, while the utility of the data is

preserved to a large extend.

Reconstruction-based approaches, inspired by the work of [17, 61] and intro-

duced by Natwichai, et al. [52], offer an alternative to suppression-based techniques.

These approaches target at reconstructing the original database by using only sup-

porting transactions of the nonsensitive rules. As discussed in [71], reconstruction-

based approaches are advantageous when compared to heuristic data modifica-

tion algorithms, since they hardly introduce any side-effects to the hiding process.

They operate as follows. First, they perform rule-based classification to the original

database to enable the data owner to identify the sensitive rules. Then, they construct

a decision tree classifier that contains only nonsensitive rules, approved by the data

owner. The produced database remains similar to the original one, except from the

sensitive part, while the difference between the two databases is proven to reduce as

the number of rules increases.

Natwichai, et al. [53] propose a methodology that further improves the quality

of the reconstructed database. This is accomplished by extracting additional charac-

teristic information from the original database with regard to the classification and

by improving the decision tree building process. Furthermore, with the aid of infor-

mation gain, the usability of the released database is ameliorated even in the case of

hiding many sensitive rules with high discernibility in records classification.

A similar approach to that of [53] was proposed by Katsarou, et al. [40]. The

proposed methodology operates by modifying transactions supporting both sensi-

tive and nonsensitive classification rules in the original database and then using the

supporting transactions of the nonsensitive rules to produce its sanitized counterpart.

4.2 Privacy Preserving Clustering

The area of privacy preserving clustering collects methodologies that aim to pro-

tect the underlying attribute values and thus assure the privacy of individuals who

are recorded in the data, when the data is shared for clustering purposes. Achieving

privacy preservation when sharing data for clustering is a challenging task since the

privacy requirements should be met, while the clustering results remain valid. The

methodologies that have been proposed so far can be separated into two broad cat-

egories: the transformation-based approaches and the protocol-based approaches.

Transformation-based approaches are directly related to the distortion-based ap-

proaches of association rule hiding. They operate by performing a data transforma-

Search WWH ::

Custom Search

Home