Towards Secure Data Outsourcing - Database Security: Applications and Trends

Databases Reference

In-Depth Information

required to precisely know the range of values that each partition contains.

Naturally, increasing partition sizes tends to render this knowledge more fuzzy.

This, however, requires additional client side work in pruning the (now) larger

results (due to the larger partitions). Even if a single data tuple matches the

query, its entire corresponding partition will be transferred to the client. On

the other hand, reducing partition size will immediately reveal more infor-

mation to the server, as the smaller number of items per partition and the

knowledge of the covered range will allow it to determine more accurately what

the likely values are for each tuple. Additionally, for more complex queries,

particularly joins, due to the large segments, such methods can feature an

communication overhead larger than the entire database, hardly a practical

proposition.

Nevertheless, these efforts illustrate a trade-off between confidentiality and

overheads: large partitions reveal less but require more computation on the

client, small partitions reveal more but increase eciency. Ultimately, how-

ever, unless partitions are very large (in which case the purpose of outsourcing

is likely defeated by the additional overheads) true confidentiality cannot be

achieved by such partitioning schemes. Statistical security needs to be re-

placed by ecient, yet stronger mechanisms. In the following we show how

this can be achieved not only for range queries but also for more complex

joins.

In ongoing work [42] we explore a low-overhead method for executing bi-

nary predicate joins with confidentiality on outsourced data. It handles general

binary join predicates that satisfy certain properties: for any value in the con-

sidered data domain, the number of corresponding “matching” pair values

(for which the predicate holds) is (i) finite, and (ii) the average of its expected

value is upper bound. We call these predicates expected finite match (EFM)

predicates.

Such predicates are extremely common and useful, including discrete data

scenarios, such as ranges, inventory and company asset data-sets, forensics,

genome and DNA data (e.g., fuzzy and exact Hamming distances), and health-

care databases (e.g., bacteria to antibiotics matches). For illustration purposes

let us consider the following discrete time - range join query that joins arrivals

with departures within the same hour (e.g., in a train station):

SELECT * FROM arrivals,departures

WHERE departures.time - arrivals.time < 60

For any finite time granularity (e.g. minutes) the join predicate above is an

EFM predicate (e.g., with an AEMS of 60). Performing such joins at the server

side on encrypted data, is the main functionality desired here.

To analyze the confidentiality assurances of this solution we will consider

here a server that is curious : given the possibility to get away undetected,

it will attempt to compromise data confidentiality (e.g., in the process of

query execution). Naturally, it should not be able to evaluate predicates (i)

without the permission of the client, (ii) on two values of the same attribute,

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home