Databases Reference
In-Depth Information
Another unfortunate aspect of clustered indexes is their overuse. Most publications on this subject advise
that every table should have a clustered index. In fact, when you define a primary key, SQL Server
defaults that key as a clustered index. The remainder of this section will discuss some techniques you can
use to determine when to define a clustered index on your tables.
If you've worked with SQL Server any length of time you've probably heard or read statements that
claim that every table should have a clustered index. This is simply wrong and a quick example with the
following table schema will bear that out:
CREATE TABLE Product
(ProductID
uniqueidentifier
NOT NULL,
SupplierID
uniqueidentifier
NULL)
CREATE UNIQUE CLUSTERED INDEX bad_cluster
ON Product (ProductID)
The ProductID column is defined as a uniqueidentifier data type, and a clustered index. A uniqueidenti-
fier column shouldn't be used for a clustered index.
Therefore, not all tables will benefit from a clustered index. The question then becomes, ''Is this example
the only one where a clustered shouldn't be used, or are there others?'' The remainder of this section will
address that question.
Using the following approach will help identify tables deriving benefit from a clustered index. If the table
in question doesn't meet all the requirements, don't create a clustered index.
Don't Create a Clustered Index on an Artificial Column
An artificial column is one whose value is generated by the database engine. In SQL Server, this is most
commonly achieved by two methods:
Designate a column as an identity column.
Define a column's data type as uniqueidentifier and set its default value equal to NEWID().
Consider the second type of artificial column. A uniqueidentifer is the same thing as a GUID. They are
globally unique and random, thus one value has absolutely no relation to another value. Defining a
clustered index on a uniqueidentifier column forces the server to order the rows physically in the table
based on a column whose values are totally random and unrelated to one another.
Now consider the first type of artificial column. The reason for excluding this requires a little under-
standing of SQL Server data page management. SQL Server recognizes two types of tables: heap and
clustered. Any table that doesn't have a clustered index is a heap table. Any table that has a clustered
index is a clustered table. When inserting rows into a heap table the server will simply add the new
row at the end of the table's data pages. However, when inserting rows into a clustered table the server
must find the proper data page to place the new row. When using a typical identity column the proper
data page will always be the last one. Therefore, defining a clustered index on an identity column forces
the server to go through the process of determining the proper data page to store a new row — even
when this location will be the last page. If the clustered index were removed then the heap nature of the
table will naturally cause new rows to be stored on the last data page, without having to go through the
process of determining where.
Search WWH ::




Custom Search