Databases Reference
In-Depth Information
are a multitude of applications that would benefit from a method of rights
protection for such data. In this section we propose and analyze watermarking
relational data with categorical types.
Additional challenges in this domain derive from the fact that one cannot
rely on arbitrary small (e.g., numeric) alterations to the data in the embedding
process. Any alteration has the potential to be significant, e.g., changing DE-
PARTURE CITY from “Chicago” to “Bucharest” is likely to affect the data
quality of the result more than a simple change in a numeric domain. There
are no “epsilon” changes in this domain. This completely discrete character-
istic of the data requires discovery of fundamentally new bandwidth channels
and associated encoding algorithms.
4.1 The Adversary Revisited
We outlined above a set of generic attacks in a relational data framework.
Here we discuss additional challenges associated with categorical data types.
A3. Alteration. In the categorical data framework, subset alteration is
intuitively quite expensive from a data-value preservation perspective. One has
also to take into account semantic consistency issues that become immediately
visible because of the discrete nature of the data.
A6. Attribute Remapping. If data semantics allow it, re-mapping of rela-
tion attributes can amount to a powerful attack that should be carefully con-
sidered. In other words, if Mallory can find an even partial value-preserving
mapping (the resulting mapped data set is still valuable for illicit purposes)
from the original attribute data domain to a new domain, a watermark should
hopefully survive such a transformation. The diculty of this challenge is in-
creased by the fact that there likely are many transformations available for a
specific data domain. This is thus a hard task for the generic case. One special
case is primary key re-mapping.
4.2 A Solution
In [25], [36] Sion et. al. introduce a novel method of watermarking relational
data with categorical types, based on a set of new encoding channels and al-
gorithms. More specifically, two domain-specific watermark embedding chan-
nels are used, namely (i) inter-attribute associations and (ii) value occurrence
frequency-transforms of values.
Overview. The solution starts with an initial user-level assessment step in
which a set of attributes to be watermarked are selected. In its basic version,
watermark encoding in the inter-attribute association channel is deployed for
each attribute pair ( K, A ) in the considered attribute set. A subset of “fit”
tuples is selected, as determined by the association between A and K . These
tuples are then considered for mark encoding. Mark encoding alters the tu-
ple's value according to secret criteria that induces a statistical bias in the
Search WWH ::




Custom Search