Tuning T-SQL - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

The optimizer will generally choose a merge join when both join inputs are already sorted on the

join column. Sometimes, even if a sort must be done on one of the input columns, the optimizer

might decide to do that and follow up with a merge join, rather than use one of the other available

join strategies.

Hash Joins

Hash joins are used when no useful index exists on the join column in either input. They're also

used when one table is much smaller than the other. Also, hash joins work well when SQL Server

must perform joins of very large tables. See Figure 9-7 for an example of a query plan with a

hash join.

SELECT *

FROM Sales.SalesOrderHeader h

INNER JOIN Sales.SalesOrderDetail d

ON h.SalesOrderID = d.SalesOrderID

Figure 9-7

SQL Server performs hash joins by completing two phases. The first phase is known as the build phase,

and the second phase is known as the probe phase.

In the build phase all rows from the first input, known as the build input, are used to build a

structure known as a hash table. This hash table will be built using a hashing function based on

the equijoin columns. The hash table will contain some number of buckets in which data can

be placed. Also, note that the hash table is created in memory if possible. In fact, SQL Server

stores these buckets as linked lists. The key to an effective hash table is using a hashing function that will

divide the data in sets of manageable size with roughly equal membership. Also note

that SQL Server will attempt to use the smaller of the two query inputs as the basis for the

build phase.

After the hash table is built, the second phase can begin. This is the probe phase. It works by

taking one row at a time and trying to find a match in the hash table. If a match is found, then it

is output.

Search WWH ::

Custom Search

Home