Database Reference
In-Depth Information
From the drive-load point of view, a table with a low insert rate, such as a typical
CUST table, would tolerate dozens of indexes; the limiting factor would be the
response time of insert transactions.
If drive load is a problem, the obvious solution would be to try to combine
indexes. One index with 10 columns causes much less drive load than 2 indexes,
each with 6 columns.
Disk Space
If there are more than 10 million rows in a table, the cost of disk space for
the indexes may become an issue. The charge for outsourced hardware depends
mainly on two factors: consumed CPU time and allocated disk space. At the
time of writing, the monthly rent for disk space in a high-performance, fault-
tolerant disk server may be U.S.$50 per gigabyte. Installations that own their
own hardware may have an internal billing rate per gigabyte of disk space.
Database designers should be aware of the current cost of disk space. All too
often, a request to add a 100-byte column to an index, for instance, is rejected
simply because it
feels
like wasting disk space.
This is what database guru Jim Gray (Microsoft Research) said in a recent
interview (5):
I actually tracked the price of disks pretty carefully, and the price of disks has
gone down by a factor of a hundred since the vacuum cleaner idea failed
(1996). The price of disk arms has gone down by about a factor of ten in that
time. These ratios absolutely change things
...
(page 5)
To take an example, a new index with 400 bytes of user data is proposed
for a table with 10 million rows. Should we be concerned about the disk space?
The proposed index (without RAID overhead) requires about 1
.
5
×
10
,
000
,
000
×
400 bytes
=
6 GB of disk space. The cost may be perhaps $300
per month. This is probably not a showstopper if adding columns to the
current
indexes is unable to provide acceptable response times.
Buffer pools or disk caches should be increased as indexes become larger,
otherwise the number of nonleaf page I/Os will increase. With current hardware
prices, the memory cost for the nonleaf pages would be about $300 per month
in our example.
The
overhead coefficient
of fat indexes (which tend to be unique or almost
unique) depends primarily on the amount of distributed free space. A value of
1.5 is probably high enough to allow for 25% of free space per leaf page. With
an 8K leaf page, this means that five 400 byte index rows can be added to each
leaf before a split will occur. Chapter 11 recommends free space specifications
and reorganization frequencies for indexes according to row length and insert
patterns.
The length of the pointer to the table row depends on the DBMS. Direct
pointers are short, normally less than 10 bytes. Symbolic pointers may be longer;
these are used in indexes that point to an index holding the table rows.
Search WWH ::
Custom Search