Databases Reference
In-Depth Information
which in Table 10.1 would return 14:
MY_COUNT
-----------------
14
10.1.1 Counting for Index Design
In general, indexes are valuable because only a small percentage of the index data needs
to be accessed to find qualifying rows, as compared to scanning the entire table. How-
ever, there are other considerations. Specifically, scanning an entire table, though input/
ouput (I/O) intensive, is a reasonably efficient process per page of data because of the
efficiencies of large block I/O, I/O prefetching, and sequential I/O. All three of these
will generally make a table scan much more efficient per page than lookups into
indexes. Indexes are much more prone to fragmentation and random I/O on disk.
It is critical, therefore, that the index data being accessed is not only a fraction of
the table data, but is also a rather small fraction because the cost per page of index access
will be higher than the cost per page of table access during a brute-force scan. If the
index data includes a very high number of duplicates, then the number of distinct val-
ues in the index drops, and the number of row identifiers returned by an index scan
increases, as does the number of index pages that needs to be accessed. As a general rule
you should design indexes so that the number of distinct elements in the index is at least
30% of the table cardinality. For example, if the table has 1,000,000 rows, the index
should usually have at least 300,000 distinct values, otherwise it may not be a generally
useful index to a broad class of queries. There are exceptions, of course, but this is a rea-
sonable rule of thumb.
10.1.2 Counting for Materialized View Design
One of the biggest challenges with materialized view design is estimating the ultimate
size of the materialized view, which can be large. When the view is the result of one or
more joins, it's possible for the view size to become extremely large. Because the view is
materialized, it will require real storage, which is expensive, and if the view is very large,
the maintenance overhead of maintaining the view can also be prohibitive. The brute-
force strategy is of course to try to create the view and see how big it gets! However, for
complex large views it's more than worthwhile to spend a few minutes to run a count
rather than spend hours to find out the hard way the view is unreasonably large. To
determine the approximate size of a view you can do the following.
First, count the number of rows in the view as follows:
SELECT COUNT(*) FROM (<view definition> ) AS
NUM_ROWS_IN_VIEW;
Search WWH ::




Custom Search