Databases Reference
In-Depth Information
a tunable algorithm that allows him to select a desired degree of tradeoff be-
tween performance and security. In the first phase, the values appearing in the
attribute are divided into an user-specified (say M) number of buckets such
that the average number of false-positives is minimized over all possible range
queries (i.e., queries with range predicates on the specified attribute). The
buckets so created might not meet the required security criteria (i.e., some
minimum level of entropy and variance) and therefore in a second pass, the
values within these optimal buckets are re-distributed in a “controlled man-
ner” into a new set of M buckets so as to increase the value of entropy and
variance of the bucket level distributions while admitting only up to a speci-
fied maximum degree of performance degradation. The tunable (user-chosen)
parameter specifies this maximum allowed degree of quality degradation.
Similar measures of disclosure-risk have been proposed for privacy preserv-
ing data publishing [36]. There too, the key technique for achieving anonymity
is data generalization which is akin to the partitioning approach in [30]. For
more discussion on the choice of the privacy measures and details of the par-
titioning and redistribution algorithms the interested reader can refer to [30].
Discussion
In this section only single dimensional data was considered. Most real data sets
have multiple attributes with various kinds of dependencies and correlations
between the attributes. There may be some kinds of functional dependencies
(exact or partial) and correlations as in multidimensional relational data or
even structural dependencies as in XML data. Therefore, knowledge about
one attribute might disclose the value of another via the knowledge of such
associations. The security-cost analysis for such data becomes significantly
different. Also, in this section, the analysis that was presented, was carried
out for the worst-case scenario where it was assumed that the complete value
distribution of the bucket is known to an adversary. In reality it is unrealistic
to assume that an adversary has exact knowledge of the complete distribution
of a data set. Moreover, to learn the bucket-level joint-distribution of data,
the required size of the training set (in order to approximate the distribu-
tion to a given level of accuracy) grows exponentially with the number of
attributes/dimensions. This makes the assumption of “complete bucket-level”
knowledge of distribution even more unrealistic for multidimensional data. [31]
proposes a new approach to analyze the disclosure risk for multidimensional
data and extends the work in [30] to this case.
3 Trust, Encryption, Key-management, Integrity & Data
Confidentiality
Having discussed the querying aspects of encrypted data , let us look at some
basic security related issues that need to be addressed in a DAS application.
Search WWH ::




Custom Search