Counting and Data Sampling in Physical Design Exploration - Physical Database Design

Databases Reference

In-Depth Information

which in Table 10.1 would return 14:

MY_COUNT

-----------------

14

10.1.1 Counting for Index Design

In general, indexes are valuable because only a small percentage of the index data needs

to be accessed to find qualifying rows, as compared to scanning the entire table. How-

ever, there are other considerations. Specifically, scanning an entire table, though input/

ouput (I/O) intensive, is a reasonably efficient process per page of data because of the

efficiencies of large block I/O, I/O prefetching, and sequential I/O. All three of these

will generally make a table scan much more efficient per page than lookups into

indexes. Indexes are much more prone to fragmentation and random I/O on disk.

It is critical, therefore, that the index data being accessed is not only a fraction of

the table data, but is also a rather small fraction because the cost per page of index access

will be higher than the cost per page of table access during a brute-force scan. If the

index data includes a very high number of duplicates, then the number of distinct val-

ues in the index drops, and the number of row identifiers returned by an index scan

increases, as does the number of index pages that needs to be accessed. As a general rule

you should design indexes so that the number of distinct elements in the index is at least

30% of the table cardinality. For example, if the table has 1,000,000 rows, the index

should usually have at least 300,000 distinct values, otherwise it may not be a generally

useful index to a broad class of queries. There are exceptions, of course, but this is a rea-

sonable rule of thumb.

10.1.2 Counting for Materialized View Design

One of the biggest challenges with materialized view design is estimating the ultimate

size of the materialized view, which can be large. When the view is the result of one or

more joins, it's possible for the view size to become extremely large. Because the view is

materialized, it will require real storage, which is expensive, and if the view is very large,

the maintenance overhead of maintaining the view can also be prohibitive. The brute-

force strategy is of course to try to create the view and see how big it gets! However, for

complex large views it's more than worthwhile to spend a few minutes to run a count

rather than spend hours to find out the hard way the view is unreasonably large. To

determine the approximate size of a view you can do the following.

First, count the number of rows in the view as follows:

SELECT COUNT(*) FROM (<view definition> ) AS

NUM_ROWS_IN_VIEW;

Search WWH ::

Custom Search

Home