In-Memory OLTP Internals - Pro SQL Server Internals

Database Reference

In-Depth Information

Figure 32-3. Hash Indexes

Let's assume that you need to run a query that selects all rows with Name='Ann' in the transaction, which started

when the Global Transaction Timestamp was 65. SQL Server calculates the hash value for Ann , which is 'A', and

finds the corresponding bucket in the hash index, which is displayed on the left side in Figure 32-3 . It follows the

pointer from that bucket, which references a row with Name='Adam' . This row has BeginTs of 10 and EndTs of Infinity ;

therefore, it is visible to the transaction. However, the Name value does not match the predicate and the row is ignored.

In the next step, SQL Server follows the pointer from the Adam index pointer array, which references the first Ann

row. This row has BeginTs of 50 and EndTs of Infinity ; therefore, it is visible to the transaction and needs to be selected.

As a final step, SQL Server follows the next pointer in the index. Even though the last row also has Name='Ann' , it

has EndTs of 50 and is invisible to the transaction.

Obviously, the performance of queries that scan an index chain greatly depends on the number of rows in the

chain. The greater the number of rows that need to be processed; the slower the query.

There are two factors that affect index chain size in hash indexes. The first factor is index selectivity. Duplicate

key values generate the same hash and belong to the same index chain. Therefore, indexes with low selectivity are less

efficient.

Another factor is the number of hash buckets in the index. In an ideal situation, the number of buckets in an array

would match the number of unique key values in the index, and every unique key value would have its own bucket.

You should specify the number of buckets during the index creation stage. Unfortunately, it is impossible

to change it after the table has been created. The only option for changing the bucket count is by dropping and

recreating the table.

■ you should analyze the data and include a future system growth projection into the analysis when

determining the optimal bucket count for the hash index. Underestimation and overestimation are both bad.

Underestimation increases the size of the index chain while overestimation wastes system memory.

Important

Internally, SQL Server rounds up the number of buckets specified for an index to the next power of two. For example,

a hash index defined with BUCKET_COUNT=100000 would have 131,072 buckets in the hash array.

Search WWH ::

Custom Search

Home