Databases Reference
In-Depth Information
function is both science and art (and prayer doesn't hurt either). The hash function must divvy up the
build input into evenly distributed hash buckets. This is easier said than done. Another issue is using the
smaller input for the build phase. Sometimes SQL Server will choose a build input that turns out to be the
larger input. When this occurs SQL Server will switch the roles of the build and probe inputs in a process
called role reversal. Also, if the hash table is too big to fit into memory it will be spooled to tempdb in
partitions. The partitions will then be pulled into memory when needed. This significantly increases the
amount of I/O and CPU usage.
Thoughts on Join Types
Before SQL Server version 7, the only type of join was the nested loop join. At that
time this was adequate for most database designs. OLTP type databases dominated the
landscape. The record-to-record navigation needed was well served by this technique.
As decision support systems became more and more prevalent, additional join
techniques were warranted. Another issue is indexing. As systems became more
complex, creating effective indexes for a given query workload became impossibly
difficult. Thus, adding different join techniques to SQL Server became necessary.
Merge joins were the first logical step. This technique provides better performance for
larger join inputs.
Finally, hash joins were added. They can support very large join inputs. However, it
does come at a price. The requirements for CPU, memory, and possibly disk make hash
joins resource intensive. So, given the resource requirements for hash joins, it seems
reasonable to ask how they can ever be worth it. Interestingly, Microsoft has done
some testing to answer this question. They learned that for large join inputs, merge joins
average about 40 percent slower than hash joins. The reason has to do with indexes. If
there are very few indexes in the database or the indexes don't serve a particular query,
then hash join often works well. In fact, hash join can be thought of as an in-memory,
on-demand index.
SpoolOperators
Spool operators in a query plan are there to temporarily retain intermediate results of a complex query.
For example, large subqueries or remote scans such as querying a linked server. The spool operator takes
the rows produced by its previous operators and stores them in a table in tempdb. This table only exists
for the lifetime of the query.
There are a couple of concepts to know about when it comes to spool operators. First, spool operators
usually sit between two other operators. The operator on the left of the spool operator is the parent
operator and the operator on the right is the child operator.
The other concepts are rewind and rebind. A rewind means that the result set from the child operator can
be reused by the parent operator. A rebind means that the result set from the child operator cannot be
reused by the parent operator. Therefore, the result set from child operator has to be reevaluated and the
tempdb table has to be rebuilt.
Finally, there is the concept of physical versus logical operators. All operators, not just spool operators,
are classified as logical or physical operators.
Search WWH ::




Custom Search