Databases Reference
In-Depth Information
required to precisely know the range of values that each partition contains.
Naturally, increasing partition sizes tends to render this knowledge more fuzzy.
This, however, requires additional client side work in pruning the (now) larger
results (due to the larger partitions). Even if a single data tuple matches the
query, its entire corresponding partition will be transferred to the client. On
the other hand, reducing partition size will immediately reveal more infor-
mation to the server, as the smaller number of items per partition and the
knowledge of the covered range will allow it to determine more accurately what
the likely values are for each tuple. Additionally, for more complex queries,
particularly joins, due to the large segments, such methods can feature an
communication overhead larger than the entire database, hardly a practical
proposition.
Nevertheless, these efforts illustrate a trade-off between confidentiality and
overheads: large partitions reveal less but require more computation on the
client, small partitions reveal more but increase eciency. Ultimately, how-
ever, unless partitions are very large (in which case the purpose of outsourcing
is likely defeated by the additional overheads) true confidentiality cannot be
achieved by such partitioning schemes. Statistical security needs to be re-
placed by ecient, yet stronger mechanisms. In the following we show how
this can be achieved not only for range queries but also for more complex
joins.
In ongoing work [42] we explore a low-overhead method for executing bi-
nary predicate joins with confidentiality on outsourced data. It handles general
binary join predicates that satisfy certain properties: for any value in the con-
sidered data domain, the number of corresponding “matching” pair values
(for which the predicate holds) is (i) finite, and (ii) the average of its expected
value is upper bound. We call these predicates expected finite match (EFM)
predicates.
Such predicates are extremely common and useful, including discrete data
scenarios, such as ranges, inventory and company asset data-sets, forensics,
genome and DNA data (e.g., fuzzy and exact Hamming distances), and health-
care databases (e.g., bacteria to antibiotics matches). For illustration purposes
let us consider the following discrete time - range join query that joins arrivals
with departures within the same hour (e.g., in a train station):
SELECT * FROM arrivals,departures
WHERE departures.time - arrivals.time < 60
For any finite time granularity (e.g. minutes) the join predicate above is an
EFM predicate (e.g., with an AEMS of 60). Performing such joins at the server
side on encrypted data, is the main functionality desired here.
To analyze the confidentiality assurances of this solution we will consider
here a server that is curious : given the possibility to get away undetected,
it will attempt to compromise data confidentiality (e.g., in the process of
query execution). Naturally, it should not be able to evaluate predicates (i)
without the permission of the client, (ii) on two values of the same attribute,
Search WWH ::




Custom Search