Tuning the Schema - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

One exception to this is if you're going to use that identity column for range-based queries. For example,

suppose you have a table Order with an OrderID identity column. You may have queries that retrieve

rows where OrderID is between 100 and 200. A good example of this is in the world of web applications.

Those applications usually return only a fixed amount of rows at a time, say 20.

The Table Must Be Relatively Large

Classifying a table as large is, well ... relative. To paraphrase a popular expression, ''one man's large table

is another man's small table.'' Everyone will have to determine the meaning of large. A good yardstick

to use is page count. Additionally, use multiples of 8, that is, 8, 16, 32, and so on. The reasoning for this

is that SQL Server reads or allocates 8 pages at a time. Thus, when a row is retrieved from a table, the

page containing the row, as well as the next 7 pages,isread.Likewise,whenSQLServerallocatesnew

data pages to a table, it will actually add 8 of them, not just one. Therefore, choose a number such as 256.

Then any table that contains more than 256 data pages is considered large, thereby making it a candidate

for clustered indexing. Conversely, any table with fewer than 256 pages wouldn't be a candidate for

clustered indexing. Note that if the table under consideration is currently empty, then use an estimate of

the table's data page requirements.

The Table Must Be Relatively Static

Again, classifying a table as static is also relative. This is the most important tenet of these guidelines. The

purpose here is to determine the frequency of inserts that a table experiences. As this frequency increases,

the prospect for a clustered index decreases. If the table under consideration experiences frequent inserts,

such as an OLTP-style transaction table, then it's not a candidate for a clustered index. However, if the

table experiences infrequent inserts, as with most data warehouse tables, then it is a candidate.

The Table Should Generally Be Queried Using the Same Column

This means that for the table being considered; when filtering rows, or joining the table to others, those

operations are typically done with the most frequently used column(s). Perhaps a better way to state this

is that a clustered index should only be defined using the table's most popular column(s). If no column(s)

stand out as being the most popular then don't create a clustered index.

Clustered Index Summary

To summarize, these guidelines lead to one underlying principle. When contemplating a clustered index

on a table, you must first determine the most important operation for the table. If the most important

operation is reading, which is the case in most data-warehouse tables, the table should probably have a

clustered index. Remember to keep in mind all the preceding guidelines. However, if the most impor-

tant operation is inserting, which is the case in most OLTP tables, the table probably shouldn't have a

clustered index. Following these guidelines will lead to effective use (and non-use) of clustered indexes.

Again, another real-world experience comes to mind. A colleague had a classic Order-OrderDetail table

arrangement in his application. His application had an end of day report, which took about 30 minutes

to run, and he wanted to improve that time. One of the DBA's recommended that a clustered index

be created on the OrderID column in the OrderDetail table. After creating this index the end of day

report improved to about 5 minutes. However, the next day when the order entry clerks started using

the application, it quickly ground to a halt. The reason was that the clustered index slowed down the

inserting and updating operations to the point that the system became unusable.

Search WWH ::

Custom Search

Home