Database Reference
In-Depth Information
Figure 23-4. Hash values in lots of buckets and few buckets
The first set of buckets has what is called a shallow distribution , which is few hash values distributed across a
lot of buckets. This is a more optimal storage plan. Some buckets may be empty as you can see, but the lookup speed
is fast because each bucket contains a single value. The second set of buckets shows a small bucket count, or a deep
distribution . This is more hash values in a given bucket, requiring a scan within the bucket to identify individual
hash values.
Microsoft's recommendation on bucket count is go between one to two times the quantity of the number of
rows in the table. But, since you can't alter in-memory tables, you also need to take into account projected growth.
If you think your in-memory table is likely to grow three times as large over the next three to six months, you may want
to expand the size of your bucket count. The only problem you'll encounter with an oversized bucket count is that
scans will take longer, so you'll be allocating more memory. But, if your queries are likely to lead to scans, you really
shouldn't be using the nonclustered hash index. Instead, just go to the nonclustered index.
You also need to worry about how many values can be returned by the hash value. Unique indexes and primary
keys are prime candidates for using the hash index because they're always unique. Microsoft's recommendation is
that if, on average, you're going to see more than five values for any one hash value, you should move away from the
nonclustered hash index and use the nonclustered index instead. This is because the hash bucket simply acts as a
pointer to the first row that is stored in that bucket. Then, if duplicate or additional values are stored in the bucket, the
first row points to the next row, and each subsequent row points to the row following it. This can turn point lookups
into scanning operations, again radically hurting performance. This is why going with a small number of duplicates,
less than five, or unique values work best with hash indexes.
To see the distribution of your index within the hash table, you can use sys.dm_db_xtp_hash_index_stats .
SELECT i.name AS 'index name',
hs.total_bucket_count,
hs.empty_bucket_count,
hs.avg_chain_length,
hs.max_chain_length
 
Search WWH ::




Custom Search