Database Reference
In-Depth Information
Figure 32-3. Hash Indexes
Let's assume that you need to run a query that selects all rows with Name='Ann' in the transaction, which started
when the Global Transaction Timestamp was 65. SQL Server calculates the hash value for Ann , which is 'A', and
finds the corresponding bucket in the hash index, which is displayed on the left side in Figure 32-3 . It follows the
pointer from that bucket, which references a row with Name='Adam' . This row has BeginTs of 10 and EndTs of Infinity ;
therefore, it is visible to the transaction. However, the Name value does not match the predicate and the row is ignored.
In the next step, SQL Server follows the pointer from the Adam index pointer array, which references the first Ann
row. This row has BeginTs of 50 and EndTs of Infinity ; therefore, it is visible to the transaction and needs to be selected.
As a final step, SQL Server follows the next pointer in the index. Even though the last row also has Name='Ann' , it
has EndTs of 50 and is invisible to the transaction.
Obviously, the performance of queries that scan an index chain greatly depends on the number of rows in the
chain. The greater the number of rows that need to be processed; the slower the query.
There are two factors that affect index chain size in hash indexes. The first factor is index selectivity. Duplicate
key values generate the same hash and belong to the same index chain. Therefore, indexes with low selectivity are less
efficient.
Another factor is the number of hash buckets in the index. In an ideal situation, the number of buckets in an array
would match the number of unique key values in the index, and every unique key value would have its own bucket.
You should specify the number of buckets during the index creation stage. Unfortunately, it is impossible
to change it after the table has been created. The only option for changing the bucket count is by dropping and
recreating the table.
you should analyze the data and include a future system growth projection into the analysis when
determining the optimal bucket count for the hash index. Underestimation and overestimation are both bad.
Underestimation increases the size of the index chain while overestimation wastes system memory.
Important
Internally, SQL Server rounds up the number of buckets specified for an index to the next power of two. For example,
a hash index defined with BUCKET_COUNT=100000 would have 131,072 buckets in the hash array.
 
 
Search WWH ::




Custom Search