Databases Reference
In-Depth Information
of value of watermarked Works is necessarily relative and largely influenced by
each semantic context it appears in. For example, while a statistical analyst
would be satisfied with a set of feature summarizations (e.g., average, higher-
level moments) of a numeric data set, a data mining application may need a
majority of the data items, for example to validate a classification hypothesis.
It is often hard to define the available “bandwidth” for inserting the wa-
termark directly. Instead, allowable distortion bounds for the input data can
be defined in terms of consumer metrics. If the watermarked data satisfies
the metrics, then the alterations induced by the insertion of the watermark
are considered to be acceptable. One such simple yet relevant example for
numeric data, is the case of maximum allowable mean squared error (MSE),
in which the usability metrics are defined in terms of mean squared error
tolerances as ( s i
i =1 , ..., n and ( s i
v i ) 2
v i ) 2
<t i ,
<t max , where
S
=
{
s 1 , ..., s n }⊂ R
, is the data to be watermarked,
V
=
{
v 1 , ..., v n }
is the
result,
T
=
{
t 1 , ..., t n }⊂ R
and t max R
define the guaranteed error bounds
at data distribution time. In other words
defines the allowable distortions
for individual elements in terms of MSE and t max its overall permissible value.
Often however, specifying only allowable change limits on individual val-
ues, and possibly an overall limit, fails to capture important semantic features
associated with the data - especially if the data is structured. Consider for
example, age data. While a small change to the age values may be acceptable,
it may be critical that individuals that are younger than 21 remain so even
after watermarking if the data will be used to determine behavior patterns
for under-age drinking. Similarly, if the same data were to be used for identi-
fying legal voters, the cut-off would be 18 years. Further still, for some other
application it may be important that the relative ages, in terms of which one
is younger, not change. Other examples of constraints include: (i) uniqueness
- each value must be unique; (ii) scale - the ratio between any two number
before and after the change must remain the same; and (iii) classification -
the objects must remain in the same class (defined by a range of values) be-
fore and after the watermarking. As is clear from the above examples, simple
bounds on the change of numerical values are often not enough.
Structured collections, present further constraints that must be adhered to
by the watermarking algorithm. Consider a data warehouse organized using
a standard Star schema with a fact table and several dimension tables. It
is important that the key relationships be preserved by the watermarking
algorithm. This is similar to the “Cascade on update” option for foreign keys
in SQL and ensures that tuples that join before watermarking also join after
watermarking. This requires that the new value for any attribute should be
unique after the watermarking process. In other words, we want to preserve
the relationship between the various tables. More generally, the relationship
could be expressed in terms of an arbitrary join condition, not just a natural
join. In addition to relationships between tuples, relational data may have
constraints within tuples. For example, if a relation contains the start and
T
Search WWH ::




Custom Search