Tuning T-SQL - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

function is both science and art (and prayer doesn't hurt either). The hash function must divvy up the

build input into evenly distributed hash buckets. This is easier said than done. Another issue is using the

smaller input for the build phase. Sometimes SQL Server will choose a build input that turns out to be the

larger input. When this occurs SQL Server will switch the roles of the build and probe inputs in a process

called role reversal. Also, if the hash table is too big to fit into memory it will be spooled to tempdb in

partitions. The partitions will then be pulled into memory when needed. This significantly increases the

amount of I/O and CPU usage.

Thoughts on Join Types

Before SQL Server version 7, the only type of join was the nested loop join. At that

time this was adequate for most database designs. OLTP type databases dominated the

landscape. The record-to-record navigation needed was well served by this technique.

As decision support systems became more and more prevalent, additional join

techniques were warranted. Another issue is indexing. As systems became more

complex, creating effective indexes for a given query workload became impossibly

difficult. Thus, adding different join techniques to SQL Server became necessary.

Merge joins were the first logical step. This technique provides better performance for

larger join inputs.

Finally, hash joins were added. They can support very large join inputs. However, it

does come at a price. The requirements for CPU, memory, and possibly disk make hash

joins resource intensive. So, given the resource requirements for hash joins, it seems

reasonable to ask how they can ever be worth it. Interestingly, Microsoft has done

some testing to answer this question. They learned that for large join inputs, merge joins

average about 40 percent slower than hash joins. The reason has to do with indexes. If

there are very few indexes in the database or the indexes don't serve a particular query,

then hash join often works well. In fact, hash join can be thought of as an in-memory,

on-demand index.

SpoolOperators

Spool operators in a query plan are there to temporarily retain intermediate results of a complex query.

For example, large subqueries or remote scans such as querying a linked server. The spool operator takes

the rows produced by its previous operators and stores them in a table in tempdb. This table only exists

for the lifetime of the query.

There are a couple of concepts to know about when it comes to spool operators. First, spool operators

usually sit between two other operators. The operator on the left of the spool operator is the parent

operator and the operator on the right is the child operator.

The other concepts are rewind and rebind. A rewind means that the result set from the child operator can

be reused by the parent operator. A rebind means that the result set from the child operator cannot be

reused by the parent operator. Therefore, the result set from child operator has to be reevaluated and the

tempdb table has to be rebuilt.

Finally, there is the concept of physical versus logical operators. All operators, not just spool operators,

are classified as logical or physical operators.

Search WWH ::

Custom Search

Home