Databases Reference
In-Depth Information
One exception to this is if you're going to use that identity column for range-based queries. For example,
suppose you have a table Order with an OrderID identity column. You may have queries that retrieve
rows where OrderID is between 100 and 200. A good example of this is in the world of web applications.
Those applications usually return only a fixed amount of rows at a time, say 20.
The Table Must Be Relatively Large
Classifying a table as large is, well ... relative. To paraphrase a popular expression, ''one man's large table
is another man's small table.'' Everyone will have to determine the meaning of large. A good yardstick
to use is page count. Additionally, use multiples of 8, that is, 8, 16, 32, and so on. The reasoning for this
is that SQL Server reads or allocates 8 pages at a time. Thus, when a row is retrieved from a table, the
page containing the row, as well as the next 7 pages,isread.Likewise,whenSQLServerallocatesnew
data pages to a table, it will actually add 8 of them, not just one. Therefore, choose a number such as 256.
Then any table that contains more than 256 data pages is considered large, thereby making it a candidate
for clustered indexing. Conversely, any table with fewer than 256 pages wouldn't be a candidate for
clustered indexing. Note that if the table under consideration is currently empty, then use an estimate of
the table's data page requirements.
The Table Must Be Relatively Static
Again, classifying a table as static is also relative. This is the most important tenet of these guidelines. The
purpose here is to determine the frequency of inserts that a table experiences. As this frequency increases,
the prospect for a clustered index decreases. If the table under consideration experiences frequent inserts,
such as an OLTP-style transaction table, then it's not a candidate for a clustered index. However, if the
table experiences infrequent inserts, as with most data warehouse tables, then it is a candidate.
The Table Should Generally Be Queried Using the Same Column
This means that for the table being considered; when filtering rows, or joining the table to others, those
operations are typically done with the most frequently used column(s). Perhaps a better way to state this
is that a clustered index should only be defined using the table's most popular column(s). If no column(s)
stand out as being the most popular then don't create a clustered index.
Clustered Index Summary
To summarize, these guidelines lead to one underlying principle. When contemplating a clustered index
on a table, you must first determine the most important operation for the table. If the most important
operation is reading, which is the case in most data-warehouse tables, the table should probably have a
clustered index. Remember to keep in mind all the preceding guidelines. However, if the most impor-
tant operation is inserting, which is the case in most OLTP tables, the table probably shouldn't have a
clustered index. Following these guidelines will lead to effective use (and non-use) of clustered indexes.
Again, another real-world experience comes to mind. A colleague had a classic Order-OrderDetail table
arrangement in his application. His application had an end of day report, which took about 30 minutes
to run, and he wanted to improve that time. One of the DBA's recommended that a clustered index
be created on the OrderID column in the OrderDetail table. After creating this index the end of day
report improved to about 5 minutes. However, the next day when the order entry clerks started using
the application, it quickly ground to a halt. The reason was that the clustered index slowed down the
inserting and updating operations to the point that the system became unusable.
Search WWH ::




Custom Search