Security Re-engineering for Databases: Concepts and Techniques - Database Security: Applications and Trends

Databases Reference

In-Depth Information

A 1 ,...,A n , the values for individual attributes A i are analyzed. This analysis

includes determining the distribution and frequency of attribute values, e.g.,

in the form of histograms, and the minimum and maximum values and lengths

for numerical and alpha-numerical attributes, respectively.

Such a simple analysis, which can be done using SQL statements at the

relation level, can reveal quite a lot of useful information, for example, outliers

that do not conform to certain properties the data are assumed to have. The

analysis can also be extended to sets of tuples from one or more relations where

now tuples and combinations thereof are analyzed and correlations among at-

tribute values are determined using standard association analysis techniques

(see, e.g., [57]). Assume, for example, two relations R 1 ( A, C )and R 2 ( B, D )

with a foreign key dependency R 2 .B

→

R 1 .A . For any two matching tuples

t 1 ∈

R 2 , an association can describe that the value of t 1 .C always

determines the range of the attribute value t 2 .D . The association rules discov-

ered are then evaluated and compared to what is expected from the data. In

principle, many of the analysis tasks resemble standard actions employed in

the evaluation of the quality of the data [9, 18]. It should also be noted that

in production databases, most of the above information is typically readily

available. This is in particular the case where statistics for relations are pe-

riodically maintained to provide cost-based query optimizer with information

for choosing ecient query evaluation plans. Statistical information about the

relations is then available in the database's data dictionary.

For data profiling, it is assumed that the above analysis tasks are per-

formed on a snapshot of the database, that is, at a particular point in time.

This can be done in a batch-mode when the workload of the database is low

(e.g., during night), or by using a stand-by or recent backup database. For

large-scale databases with hundreds of relations, clearly not all relations are

analyzed but only those that are relevant to specific security policies or those

that contain sensitive or mission-critical data as indicated in the initial step in

Figure 1. The snapshot profiles for these individual relations or parts thereof

(e.g., sets of tuples that have some specific properties or contain particularly

sensitive information) are managed in separate relations, specifically designed

for access to database security mechanisms (see Section 5). In the most sim-

ple case, a snapshot profile for a relation R

R 1 ,t 2 ∈

determined at a particular

point in time t , denoted DataP rof ( R, t ), is a collection of measure-value pairs

that describe properties of attributes of R . The measure is related to the data,

e.g., number or frequencies of different attribute values, and the measure value

denotes what has been computed for a measure.

The above discussion makes one point very clear: developing a misuse de-

tection approach on top of a database for which the quality of the data is

not known or the data is of poor quality is likely to be not practical. That

is, if the normal behavior of users is to be determined, erroneous, missing,

or extra data in the underlying relations can significantly skew the statistical

models describing the normal behavior of users in terms of accesses to the

data. Therefore, it is essential to “clean” the data, i.e., removing (if possi-

∈R

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home