Database Reference
In-Depth Information
in data warehousing and data mining may be considered as statistical databases in
some significant sense.
Need for Data Access Statistical databases serve critical purposes. They store
rich data content providing population statistics by age groups, income levels, house-
hold sizes, education levels, and so on. Government statisticians, market research
companies, and institutions estimating economic indicators depend on statistical
databases. These professionals select records from statistical databases to perform
statistical and mathematical functions. They may count the number of entities in the
selected sample of records from a statistical database, add up numbers, take aver-
ages, find maximum and minimum amounts, and calculate statistical variances and
standard deviations.
All such professionals need access to statistical databases. However, there is one
big difference between users of an operational database needing access privileges
and professionals requiring access privileges to a statistical database. Users of an
operational database need information to run the day-to-day business—to enter an
order, to check stock of a single product, to send a single invoice. That is, these users
need access privileges to individual records in the database. On the other hand, pro-
fessionals using statistical databases need access privileges to access groups of
records and perform mathematical and statistical calculations from the selected
groups. They are not interested in single records, only in samples containing groups
of records.
Security Challenge So what is the problem with granting access privileges to pro-
fessionals to use a statistical database just the way you would grant privileges to use
any other type of database? Here is the problem: The professionals must be able to
read individual records in a select sample group for performing statistical calcula-
tions but, at the same time, must not be allowed to find out what is in a particular
record.
For example, take the case of the international bank. The bank's statisticians
need access to the bank's database to perform statistical calculations. For this
purpose, you need to grant them access privileges to read individual records. But,
at the same time, you cannot allow them to see Jane Doe's bank account balance.
The challenge in the case of the bank is this: How can you grant access privileges
to the statisticians without compromising the confidentiality of individual bank
customers?
Perhaps one possible method is to grant access privileges to individual records
because the statistician needs to read a group of records for the calculations but
restrict the queries to perform only mathematical and statistical functions such as
COUNT, SUM, AVG, MAX, MIN, variance and standard deviations.
Although this method appears to be adequate to preserve the confidentiality of
individual customers, a clever professional can run a series of queries and narrow
the intersection of the query result to one customer record. This person can infer
the values in individual rows by running a series of ingenuous queries. Each query
produces a result set. Even though only statistical functions are permitted, by com-
bining the different results through a series of clever queries, information about a
single entity may be determined. Figure 16-8 illustrates how, by using different pre-
dicates in queries from a bank's statistical database, the bank balance of a single
Search WWH ::




Custom Search