Databases Reference
In-Depth Information
Multi-Bit Distribution Encoding
Encoding watermarking information in resilient numeric distribution proper-
ties of data presents a set of advantages over direct domain encoding, the
most important one being its increased resilience to various types of numeric
attacks. In [27, 29, 30, 32, 33] and [34], Sion et. al. introduce a multi-bit dis-
tribution encoding watermarking scheme for numeric types. The scheme was
designed with both an adversary and a data consumer in mind. More specifi-
cally the main desiderata were: (i) watermarking should be consumer driven
- i.e., desired semantic constraints on the data should be preserved - this
is enforced by a feedback-driven rollback mechanism, and (ii) the encoding
should survive important numeric attacks, such as linear transformation of
the data ( A3.a ), sampling ( A1 ) and random alterations ( A3 ).
Overview. The solution starts by receiving as user input a reference to the
relational data to be protected, a watermark to be encoded as a copyright
proof, a secret key used to protect the encoding and a set of data quality
constraints to be preserved in the result. It then proceeds to watermark the
data while continuously assessing data quality, potentially backtracking and
rolling back undesirable alterations that do not preserve data quality.
Watermark encoding is composed of two main parts: in the first stage, the
input data set is securely partitioned into (secret) subsets of items; the second
stage then encodes one bit of the watermark into each subset. If more subsets
(than watermark bits) are available, error correction is deployed to result in
an increasingly resilient encoding. Each single bit is encoded/represented by
introducing a slight skew bias in the tails of the numeric distribution of the
corresponding subset. The encoding is proved to be resilient to important
classes of attacks, including subset selection, linear data changes and random
item(s) alterations.
Solution Details. The algorithm proceeds as follows (see Figure 6): (a) User-
defined queries and associated guaranteed query usability metrics and bounds
are specified with respect to the given database (see below). (b) User input
determines a set of attributes in the database considered for watermarking,
possibly all. (c) From the values in each such attribute select a (maximal)
number of ( e ) unique, non-intersecting, secret subsets. (d) For each consid-
ered subset, (d.1) embed a watermark bit into it using the single-bit encoding
convention described below and then (d.2) check if data constraints are still
satisfied. If data constraints are violated, (d.3) retry different encoding pa-
rameter variations or, if still no success, (d.4) try to mark the subset as invalid
(see single-bit encoding convention below), or if still no success (d.5) ignore
the current set 3 . Repeat step (d) until no more subsets are available.
Several methods for subset selection (c) are discussed. In one version, it
proceeds as follows. The input data tuples are sorted (lexicographically) on a
3 This leaves an invalid watermark bit encoded in the data that will be corrected
by the deployed error correcting mechanisms (e.g. majority voting) at extraction
time.
Search WWH ::




Custom Search