Database Reference
In-Depth Information
An ultimate method for protection against attribute disclosure is based on the
idea that the original data is replaced, in its entirety, by a synthetic data set with
the same statistical properties (e.g. mean, variance, etc.) as the ones of the original
dataset. (Krishnamurty Muralidhar and Sarathy 2008) present a method which, be-
sides preserving the mean vector and the covariance matrix, also guarantees
similarity of the synthetic confidential values to the original confidential values.
This somewhat radical approach may encounter some resistance in applications in
which veracity of the data is important, e.g. in medical research. On the other
hand, it may be acceptable in areas where use of the aggregated data is already a
norm, e.g. in large-scale social sciences research.
A number of attribute disclosure attacks, and methods to protect against them,
have been described in the literature. We can mention here (Loukides, Gkoulalas-
Divanis et al. 2011), (Martin, Kifer et al. 2007), (Chen, LeFevre et al. 2007). The
generality of these attacks is questionable and leads to high-granularity privacy
protection approaches in which multiple transformations are applied to the data,
resulting in potentially significant decrease in data quality while still leaving the
resulting data vulnerable to privacy attacks of novel kind, which are not yet known
or described in the literature. This is analogous to multi-layered anti-virus patches,
which themselves may open vulnerabilities to novel, yet unknown viruses to come
in the future.
11.4 Privacy of Decentralized Data
As described in sec. 1, we address here an important scenario in which the owner-
ship structure of data in T is shared among multiple parties in order to obtain a
meaningful data mining result of interest to all parties. This is a frequent pheno-
menon, as groups of users may be interested in performing data mining on the un-
ion of their data, but cannot share the data for legal or commercial (competitive)
reasons. We are then talking about the data being partitioned . As shown in Fig. 3,
the partitioning may be either vertical or horizontal. In the vertical partitioning, all
the parties have data referring the same instances, but each party will have a dif-
ferent subset of attributes describing the instances. An example of such a situation
is a scenario in which one wants to perform an extensive association rule mining
on a dataset describing vehicles involved in certain types of accidents. Data
(attributes) pertaining to performance of different subcomponents (tires, engine,
brakes) will belong to different manufacturers who do not want to share it with
others, but are interested in the results. In the horizontal scenario, different parties
have different subsets of instances, but they all have the same attributes. An ex-
ample of such situation is a medical study performed jointly by a number of hos-
pitals. Each of the hospitals may have its own limited set of patients participating
in the study, but results drawn from the much larger union of all the data from
different hospitals will achieve a much higher level of credibility. Finally, mixed
horizontal-vertical scenarios are also possible.
Search WWH ::




Custom Search