Database Watermarking for Copyright Protection - Database Security: Applications and Trends

Databases Reference

In-Depth Information

Multi-Bit Distribution Encoding

Encoding watermarking information in resilient numeric distribution proper-

ties of data presents a set of advantages over direct domain encoding, the

most important one being its increased resilience to various types of numeric

attacks. In [27, 29, 30, 32, 33] and [34], Sion et. al. introduce a multi-bit dis-

tribution encoding watermarking scheme for numeric types. The scheme was

designed with both an adversary and a data consumer in mind. More specifi-

cally the main desiderata were: (i) watermarking should be consumer driven

- i.e., desired semantic constraints on the data should be preserved - this

is enforced by a feedback-driven rollback mechanism, and (ii) the encoding

should survive important numeric attacks, such as linear transformation of

the data ( A3.a ), sampling ( A1 ) and random alterations ( A3 ).

Overview. The solution starts by receiving as user input a reference to the

relational data to be protected, a watermark to be encoded as a copyright

proof, a secret key used to protect the encoding and a set of data quality

constraints to be preserved in the result. It then proceeds to watermark the

data while continuously assessing data quality, potentially backtracking and

rolling back undesirable alterations that do not preserve data quality.

Watermark encoding is composed of two main parts: in the first stage, the

input data set is securely partitioned into (secret) subsets of items; the second

stage then encodes one bit of the watermark into each subset. If more subsets

(than watermark bits) are available, error correction is deployed to result in

an increasingly resilient encoding. Each single bit is encoded/represented by

introducing a slight skew bias in the tails of the numeric distribution of the

corresponding subset. The encoding is proved to be resilient to important

classes of attacks, including subset selection, linear data changes and random

item(s) alterations.

Solution Details. The algorithm proceeds as follows (see Figure 6): (a) User-

defined queries and associated guaranteed query usability metrics and bounds

are specified with respect to the given database (see below). (b) User input

determines a set of attributes in the database considered for watermarking,

possibly all. (c) From the values in each such attribute select a (maximal)

number of ( e ) unique, non-intersecting, secret subsets. (d) For each consid-

ered subset, (d.1) embed a watermark bit into it using the single-bit encoding

convention described below and then (d.2) check if data constraints are still

satisfied. If data constraints are violated, (d.3) retry different encoding pa-

rameter variations or, if still no success, (d.4) try to mark the subset as invalid

(see single-bit encoding convention below), or if still no success (d.5) ignore

the current set 3 . Repeat step (d) until no more subsets are available.

Several methods for subset selection (c) are discussed. In one version, it

proceeds as follows. The input data tuples are sorted (lexicographically) on a

3 This leaves an invalid watermark bit encoded in the data that will be corrected

by the deployed error correcting mechanisms (e.g. majority voting) at extraction

time.

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home