Memory-Optimized OLTP Tables and Procedures - SQL Server Query Performance Tuning

Database Reference

In-Depth Information

Figure 23-4. Hash values in lots of buckets and few buckets

The first set of buckets has what is called a shallow distribution , which is few hash values distributed across a

lot of buckets. This is a more optimal storage plan. Some buckets may be empty as you can see, but the lookup speed

is fast because each bucket contains a single value. The second set of buckets shows a small bucket count, or a deep

distribution . This is more hash values in a given bucket, requiring a scan within the bucket to identify individual

hash values.

Microsoft's recommendation on bucket count is go between one to two times the quantity of the number of

rows in the table. But, since you can't alter in-memory tables, you also need to take into account projected growth.

If you think your in-memory table is likely to grow three times as large over the next three to six months, you may want

to expand the size of your bucket count. The only problem you'll encounter with an oversized bucket count is that

scans will take longer, so you'll be allocating more memory. But, if your queries are likely to lead to scans, you really

shouldn't be using the nonclustered hash index. Instead, just go to the nonclustered index.

You also need to worry about how many values can be returned by the hash value. Unique indexes and primary

keys are prime candidates for using the hash index because they're always unique. Microsoft's recommendation is

that if, on average, you're going to see more than five values for any one hash value, you should move away from the

nonclustered hash index and use the nonclustered index instead. This is because the hash bucket simply acts as a

pointer to the first row that is stored in that bucket. Then, if duplicate or additional values are stored in the bucket, the

first row points to the next row, and each subsequent row points to the row following it. This can turn point lookups

into scanning operations, again radically hurting performance. This is why going with a small number of duplicates,

less than five, or unique values work best with hash indexes.

To see the distribution of your index within the hash table, you can use sys.dm_db_xtp_hash_index_stats .

SELECT i.name AS 'index name',

hs.total_bucket_count,

hs.empty_bucket_count,

hs.avg_chain_length,

hs.max_chain_length

Search WWH ::

Custom Search

Home