Databases Reference
In-Depth Information
Clustered Indexes
The leaf level of a clustered index contains all the data, which is why we refer to it as defining the physical
ordering of the rows. A common question is, ''What should I put my clustered index on?'' Here are some
sensible guidelines that you can follow to make that decision a bit easier:
Unique data: A clustered index key (the column(s) you choose to create a clustered index on) has
to be unique. If the data itself isn't unique then SQL Server will add a 4-byte UniqueIdentifier to
every duplicate value to allow you to use that key. You can avoid this unnecessary overhead
(as small as it might be) by creating your index on a column that already has unique values.
Narrow data: A narrow column will help to keep your clustered index b-tree short and will help
to keep your other indexes smaller. The clustered index key is duplicated at the leaf-level of
every non-clustered index you create so it affects all your other indexes.
Static data: Changes to the key can be costly, as the change needs to be cascaded to the non-
clustered indexes and the foreign keys if your cluster key is also the primary key. Your clus-
ter key should be static to avoid cascading updates. See Chapter 8 for details on primary/foreign
keys and referential integrity.
ClusteringonanIdentityColumn
An identity column in SQL Server is an automatically generated incremental integer that makes an ideal
clustered index key.
It's naturally unique within the table.
It's narrow because an integer is only 4 bytes in size.
It's static because it has no relevance to the business data.
It's incremental.
This creates a natural ''hot spot'' because the pages needed for INSERT will already be in cache and it
helps to reduce fragmentation caused by page-splits. The caveat is that this isn't appropriate for a very
high number of inserts because the hot spot will become the bottleneck. If you're going to have more
than 400 inserts a second to a table, you should look at alternatives.
UniqueIdentifier/GUID
UniqueIdentifier is a SQL Server data type for GUIDs (Globally Unique IDentifiers) which are 16-byte
computer generated values created based on your hardware (to an extent) to try and guarantee unique-
ness. It's not actually unique but getting a duplicate is comparable in probability to winning the lottery.
They would seem to be ideal for merging data from different sources, because they'll be unique from
any source, but GUIDs tend to be avoided as a clustering key because of their large size, complexity in
debugging, and their random nature, causing slow inserts. They are created by the newID() function and
you can test it yourself by running select newID() . On one of our laptops this was the GUID it produced:
D29F244C-541C-4542-9CEF-BEF305FDA3D4
SQL Server 2005 has a new function for creating GUIDs that ensures that each one is greater than the
previous one created on the same machine. The function is newsequentialID() , and it can only be used
as the DEFAULT value for a column with datatype UNIQUEIDENTIFIER. There is such a column in the
Search WWH ::




Custom Search