Databases Reference
In-Depth Information
of children of a node. The DSI index is stored using two tables on the server-
side which enables retrieval of subtrees of the XML document tree without
revealing the structure.
For searching on values from an ordered domain (e.g., for range queries),
the authors use an “order-preserving encryption” scheme [4] to transform
the values from their original domain to a new domain. Since the order is
preserved, one can use B-trees on these modified values to implement range-
queries. To prevent against frequency-based attacks, the authors insert some s i
(small number) copies of each ciphertext c i corresponding to a value v i .But
this process imposes an overhead due to the increases dataset size and the
corresponding performance degradation has not been suciently analyzed.
Also, the proposed scheme seems to be requiring a large number of “keys”
(depending on the frequency range of values), thereby imposing a significant
overhead of key management. Further, this scheme is unsafe under known
plaintext attack (due to the usage of order-preserving encryption scheme [4])
thereby making it vulnerable to many attack scenarios where some plaintext-
ciphertext pairs may be revealed to an adversary.
The query processing on the server is carried out using the structural and
value indices which yields a superset of the true set of nodes satisfying the
query predicates. These encrypted nodes are then returned to the client where
a post-processing step discards the false-positives. Further details and proofs
can be found in [45].
2.6 Privacy Aware Bucketization
In the previous section we discussed how DAS functionality can be realized
when data is represented in the form of buckets. Such a bucketized represen-
tation can result in disclosure of sensitive attributes. For instance, given a
sensitive numeric attribute (e.g., salary) which has been bucketized, assume
that the adversary somehow comes to know the maximum and minimum val-
ues occurring in the bucket B . Then he can be sure that all data elements in
this bucket have a value that falls in the range [ min B ,max B ], thereby leading
to partial disclosure of sensitive values for data elements in B . If, the adver-
sary has knowledge of distribution of values in the bucket, he may also be
able to make further inference about the specific records. A natural question
is how much information does the generalized representation of data reveal
that is, given the bucket label, how well can the adversary predict/guess the
value of the sensitive attribute of a given entity? Intuitively, this depends
upon the granularity at which data is generalized. For instance, assigning all
values in the domain to a single bucket will make the bucket-label completely
non-informative. However, such a strategy will require the client to retrieve
every record from the server. On the other extreme, if each possible data value
has a corresponding bucket, the client will get no confidentiality although the
records returned by the server will contain no false positives. There is a natu-
ral trade-off between the performance overhead and the degree of disclosure.
Search WWH ::




Custom Search