Database Watermarking for Copyright Protection - Database Security: Applications and Trends

Databases Reference

In-Depth Information

of value of watermarked Works is necessarily relative and largely influenced by

each semantic context it appears in. For example, while a statistical analyst

would be satisfied with a set of feature summarizations (e.g., average, higher-

level moments) of a numeric data set, a data mining application may need a

majority of the data items, for example to validate a classification hypothesis.

It is often hard to define the available “bandwidth” for inserting the wa-

termark directly. Instead, allowable distortion bounds for the input data can

be defined in terms of consumer metrics. If the watermarked data satisfies

the metrics, then the alterations induced by the insertion of the watermark

are considered to be acceptable. One such simple yet relevant example for

numeric data, is the case of maximum allowable mean squared error (MSE),

in which the usability metrics are defined in terms of mean squared error

tolerances as ( s i −

i =1 , ..., n and ( s i −

v i ) 2

<t i ,

∀

<t max , where

{

s 1 , ..., s n }⊂ R

, is the data to be watermarked,

{

v 1 , ..., v n }

is the

result,

{

t 1 , ..., t n }⊂ R

and t max ∈ R

define the guaranteed error bounds

at data distribution time. In other words

defines the allowable distortions

for individual elements in terms of MSE and t max its overall permissible value.

Often however, specifying only allowable change limits on individual val-

ues, and possibly an overall limit, fails to capture important semantic features

associated with the data - especially if the data is structured. Consider for

example, age data. While a small change to the age values may be acceptable,

it may be critical that individuals that are younger than 21 remain so even

after watermarking if the data will be used to determine behavior patterns

for under-age drinking. Similarly, if the same data were to be used for identi-

fying legal voters, the cut-off would be 18 years. Further still, for some other

application it may be important that the relative ages, in terms of which one

is younger, not change. Other examples of constraints include: (i) uniqueness

- each value must be unique; (ii) scale - the ratio between any two number

before and after the change must remain the same; and (iii) classification -

the objects must remain in the same class (defined by a range of values) be-

fore and after the watermarking. As is clear from the above examples, simple

bounds on the change of numerical values are often not enough.

Structured collections, present further constraints that must be adhered to

by the watermarking algorithm. Consider a data warehouse organized using

a standard Star schema with a fact table and several dimension tables. It

is important that the key relationships be preserved by the watermarking

algorithm. This is similar to the “Cascade on update” option for foreign keys

in SQL and ensures that tuples that join before watermarking also join after

watermarking. This requires that the new value for any attribute should be

unique after the watermarking process. In other words, we want to preserve

the relationship between the various tables. More generally, the relationship

could be expressed in terms of an arbitrary join condition, not just a natural

join. In addition to relationships between tuples, relational data may have

constraints within tuples. For example, if a relation contains the start and

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home