Tuning the Schema - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

Also, keep indexes as narrow as possible. Use as few columns as possible. Furthermore, use columns that

have known sizes over variable-sized columns. Variable length columns such as VARCHAR aren't as

efficient as the non-variable length columns. Favor integer, non-null columns when possible.

Selectivity

A key concept in designing indexes is selectivity. This is defined as the ratio of the number of distinct

values in an index's column(s) to the number of rows in the table. The highest possible selectivity is 1.

This would occur when the number of distinct values in the index column(s) equals the number of rows

in the table. This will occur with primary keys and any other unique keys defined for a table.

Imagine a table having 100,000 rows and one of its indexed columns has 75,000 distinct values. The

selectivity of this index is 75,000 / 100,000 = 0.75.

Now suppose that same table of 100,000 rows had an index whose column only had 500 distinct values.

This index's selectivity is 500 / 100,000 = 0.005.

Note that SQL Server also defines something called density, as related to indexes (don't confuse this with

row density.) Density is the inverse of selectivity. Therefore, a low density implies a high selectivity and

vice versa.

So, you may be wondering what this has to do with creating indexes. The idea is simple; when you create

an index, use columns which are highly selective. An extension to this applies to multi-column indexes.

As you add columns to an index, do so such that the additional columns help to increase the selectivity

of the index. Otherwise, there's no point. This leads to another experience, which bears inclusion here.

Remember the 90 GB database I mentioned earlier? Well one of the actions I took in reducing that size

was cutting out the plethora of superfluous indexes. As I stated earlier, a unique key index will have a

selectivity of 1. Therefore adding additional columns to this index won't improve its selectivity at all.

This database was peppered with indexes such that the unique column was first in the list, then a second

(or third or fourth) column was listed. Those indexes did nothing for the query optimizer (which

is really the ''consumer'' of all indexes). However, they certainly had a negative impact on the size and

load times!

Another aspect of indexes is type. SQL Server recognizes two types of indexes. One is a clustered index.

The other goes by the truly uninspired name of non-clustered index.

ClusteredIndexes

Clustered index is a somewhat unfortunate term. The reason is that a clustered index is more than an

index. In fact, the main attribute of a clustered index isn't the fact that it's an index. The most important

aspect of a clustered index is the fact that it dictates where the incoming rows will be stored.

The purpose of clustered indexes is to provide greater read performance. In fact, a clustered index should

always outperform a similarly defined non-clustered index. This performance boost comes from the fact

that the bottom of the index structure (known as the leaf level) are actually the data pages.

For non-clustered indexes, the bottom of the index structure is the pointer to the data pages. (However,

if a clustered index is present, then it will point to the clustered index.) Therefore, when traversing a

non-clustered index SQL Server will have to make one extra operation to retrieve the data. However,

traversing a clustered index won't incur that because the data page is already at hand.

Search WWH ::

Custom Search

Home