Databases Reference
In-Depth Information
The need for sampling in database design is actually growing. Why? Because data
volumes are growing and doing so at a rate that is larger than the exponential growth in
power of CPUs. In short, data growth is outstripping Moore's Law! As data volumes
grow from gigabyte to terabyte to exabyte levels and beyond, they become prohibitively
difficult to analyze in their entirety, and the cost of changing the physical database
design becomes increasingly more expensive. However, as we'll soon see, sampling has
pitfalls of its own, and must be used carefully to avoid seriously misleading results. As
the old adage goes: “Lies, damn lies, and statistics!”
In the sections that follow we first introduce techniques to improve database design
by “counting,” using common SQL techniques. Secondly, we discuss sampling, sam-
pling techniques, and support for data sampling in the SQL language to show how this
can speed the process, but not without some risks.
10.1
Application to Physical Database Design
The power of SQL, and even parallel processing, can be brought to bear to count the
number of distinct values of a column or a group of columns. This counting is
extremely important for most aspects of physical database design. To perform the
counting, you can use the COUNT and DISTINCT operators in SQL. Here is an
example of the syntax that can be used to count the number of distinct values on a col-
umn named “COL1”:
SELECT COUNT(*) FROM (SELECT DISTINCT COL1, FROM
MY_FAV_TABLE) AS NUM_DISTINCT;
Similarly, the number of distinct combinations of a few columns can be determined
this way. For example, the following syntax shows how this can be counted for three col-
umns named COL1, COL2, and COL3.
SELECT COUNT(*) FROM (SELECT DISTINCT COL1, COL2, COL3
FROM MY_FAV_TABLE) AS NUM_DISTINCT;
Consider a table named MY_FAV_TABLE (Table 10.1).
Table 10.1 has 14 rows of data. Issuing our query to determine the number of dis-
tinct values in COL1 will return the following:
NUM_DISTINCT
----------------------
9
Search WWH ::




Custom Search