Databases Reference
In-Depth Information
tuples selected. Given
η
tuples in relation
R
, roughly
η/γ
values of attribute
A
are selected for watermark insertion.
For each selected tuple
r
, exactly one bit is chosen from watermark infor-
mation
wm data
and is embedded to
r.A
, where the watermark information
wm data
consists of roughly
η/γ
bits generated from a shorter watermark
wm
using error correcting code (ECC). The bit position that is chosen is deter-
mined by mapping
S
2
uniformly to the range of
wm data
indexes (this can
be done by using a pseudo-random number generator or using an embedding
map as stated in [21]). To embed the chosen bit
b
, the current categorical
value
r.A
is changed to another valid value of
A
, which is chosen from a list
L
A
of all valid values of
A
. In this process, any value
a
canbechosenfrom
L
A
(to replace
r.A
)aslongas
a
's index in
L
A
has the least significant bit
b
. This
flexibility in value selection can be exploited to maintain certain distribution
properties of
A
in watermark insertion.
For watermark detection, a number of tuples are selected the same way
as in watermark insertion, based on
S
1
. Then, for each selected tuple
r
, a bit
position in
wm data
is located the same way as in watermark insertion, based
on
S
2
. The corresponding bit value in
wm data
is extracted from the least
significant bit of the index of
r.A
in the list
L
A
. After all of the tuples are
processed, the ECC takes as input
wm data
and produces the correspond-
ing
wm
. The ECC can tolerate certain errors in detecting
wm data
and still
produce the same
wm
in watermark detection.
This watermarking scheme has been applied to binned medical data in a
hierachical manner so as to protect copyright in the presence of generalization
attack that is specific to the binned data [2].
3 Distortion
While the watermarking errors introduced to numerical values can be made
small, thus tolerable to certain data applications as illustrated in [1], the
errors introduced to categorical data can be significant, at least to individual
values. In [13], Li, Guo, and Jajodia introduced a distortion-free scheme for
watermarking categorical data (it can also be directly applied to watermarking
numerical data with no errors). In this solution, all
η
tuples in a database
relation
R
are first securely divided into
g
groups according to a secret key
.
A different watermark is embedded and verified in each group independently.
As a result, any modifications to the watermarked data can be detected and
localized to the group level with high probabilities.
K
3.1 Distortion-Free Watermarking
Algorithms 1 and 2 describe the watermark insertion process. First, a (keyed)
tuple hash and a (keyed) primary key hash are computed for each tuple
r
i
using a HMAC function. The tuple hash values are computed based on a fixed