Databases Reference
In-Depth Information
Also, keep indexes as narrow as possible. Use as few columns as possible. Furthermore, use columns that
have known sizes over variable-sized columns. Variable length columns such as VARCHAR aren't as
efficient as the non-variable length columns. Favor integer, non-null columns when possible.
Selectivity
A key concept in designing indexes is selectivity. This is defined as the ratio of the number of distinct
values in an index's column(s) to the number of rows in the table. The highest possible selectivity is 1.
This would occur when the number of distinct values in the index column(s) equals the number of rows
in the table. This will occur with primary keys and any other unique keys defined for a table.
Imagine a table having 100,000 rows and one of its indexed columns has 75,000 distinct values. The
selectivity of this index is 75,000 / 100,000 = 0.75.
Now suppose that same table of 100,000 rows had an index whose column only had 500 distinct values.
This index's selectivity is 500 / 100,000 = 0.005.
Note that SQL Server also defines something called density, as related to indexes (don't confuse this with
row density.) Density is the inverse of selectivity. Therefore, a low density implies a high selectivity and
vice versa.
So, you may be wondering what this has to do with creating indexes. The idea is simple; when you create
an index, use columns which are highly selective. An extension to this applies to multi-column indexes.
As you add columns to an index, do so such that the additional columns help to increase the selectivity
of the index. Otherwise, there's no point. This leads to another experience, which bears inclusion here.
Remember the 90 GB database I mentioned earlier? Well one of the actions I took in reducing that size
was cutting out the plethora of superfluous indexes. As I stated earlier, a unique key index will have a
selectivity of 1. Therefore adding additional columns to this index won't improve its selectivity at all.
This database was peppered with indexes such that the unique column was first in the list, then a second
(or third or fourth) column was listed. Those indexes did nothing for the query optimizer (which
is really the ''consumer'' of all indexes). However, they certainly had a negative impact on the size and
load times!
Another aspect of indexes is type. SQL Server recognizes two types of indexes. One is a clustered index.
The other goes by the truly uninspired name of non-clustered index.
ClusteredIndexes
Clustered index is a somewhat unfortunate term. The reason is that a clustered index is more than an
index. In fact, the main attribute of a clustered index isn't the fact that it's an index. The most important
aspect of a clustered index is the fact that it dictates where the incoming rows will be stored.
The purpose of clustered indexes is to provide greater read performance. In fact, a clustered index should
always outperform a similarly defined non-clustered index. This performance boost comes from the fact
that the bottom of the index structure (known as the leaf level) are actually the data pages.
For non-clustered indexes, the bottom of the index structure is the pointer to the data pages. (However,
if a clustered index is present, then it will point to the clustered index.) Therefore, when traversing a
non-clustered index SQL Server will have to make one extra operation to retrieve the data. However,
traversing a clustered index won't incur that because the data page is already at hand.
Search WWH ::




Custom Search