Managing and Querying Encrypted Data - Database Security: Applications and Trends

Databases Reference

In-Depth Information

of children of a node. The DSI index is stored using two tables on the server-

side which enables retrieval of subtrees of the XML document tree without

revealing the structure.

For searching on values from an ordered domain (e.g., for range queries),

the authors use an “order-preserving encryption” scheme [4] to transform

the values from their original domain to a new domain. Since the order is

preserved, one can use B-trees on these modified values to implement range-

queries. To prevent against frequency-based attacks, the authors insert some s i

(small number) copies of each ciphertext c i corresponding to a value v i .But

this process imposes an overhead due to the increases dataset size and the

corresponding performance degradation has not been suciently analyzed.

Also, the proposed scheme seems to be requiring a large number of “keys”

(depending on the frequency range of values), thereby imposing a significant

overhead of key management. Further, this scheme is unsafe under known

plaintext attack (due to the usage of order-preserving encryption scheme [4])

thereby making it vulnerable to many attack scenarios where some plaintext-

ciphertext pairs may be revealed to an adversary.

The query processing on the server is carried out using the structural and

value indices which yields a superset of the true set of nodes satisfying the

query predicates. These encrypted nodes are then returned to the client where

a post-processing step discards the false-positives. Further details and proofs

can be found in [45].

2.6 Privacy Aware Bucketization

In the previous section we discussed how DAS functionality can be realized

when data is represented in the form of buckets. Such a bucketized represen-

tation can result in disclosure of sensitive attributes. For instance, given a

sensitive numeric attribute (e.g., salary) which has been bucketized, assume

that the adversary somehow comes to know the maximum and minimum val-

ues occurring in the bucket B . Then he can be sure that all data elements in

this bucket have a value that falls in the range [ min B ,max B ], thereby leading

to partial disclosure of sensitive values for data elements in B . If, the adver-

sary has knowledge of distribution of values in the bucket, he may also be

able to make further inference about the specific records. A natural question

is how much information does the generalized representation of data reveal

that is, given the bucket label, how well can the adversary predict/guess the

value of the sensitive attribute of a given entity? Intuitively, this depends

upon the granularity at which data is generalized. For instance, assigning all

values in the domain to a single bucket will make the bucket-label completely

non-informative. However, such a strategy will require the client to retrieve

every record from the server. On the other extreme, if each possible data value

has a corresponding bucket, the client will get no confidentiality although the

records returned by the server will contain no false positives. There is a natu-

ral trade-off between the performance overhead and the degree of disclosure.

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home