Database Reference
In-Depth Information
In many simple queries affecting a small set of rows, nested loop joins are far superior to both hash and merge
joins. As joins get more complicated, the other join types are used where appropriate. None of the join types is
by definition bad or wrong. You're primarily looking for places where the optimizer may have chosen a type not
compatible with the data in hand. This is usually caused by discrepancies in the statistics available to the optimizer
when it's deciding which of the types to use.
Hash Join
To understand SQL Server's hash join strategy, consider the following simple query:
SELECT p.*
FROM Production.Product p
JOIN Production.ProductCategory pc
ON p.ProductSubcategoryID = pc.ProductCategoryID;
Table 7-1 shows the two tables' indexes and number of rows.
Table 7-1. Indexes and Number of Rows of the Products and ProductCategory Tables
Table
Indexes
Number of Rows
Product
Clustered index on ProductID
504
ProductCategory
Clustered index on ProductCategoryld
4
Figure 7-7 shows the execution plan for the preceding query.
Figure 7-7. Execution plan with a hash join
You can see that the optimizer used a hash join between the two tables.
A hash join uses the two join inputs as a build input and a probe input. The build input is shown as the top
input in the execution plan, and the probe input is shown as the bottom input. Usually the smaller of the two inputs
serves as the build input because it's going to be stored on the system, so the optimizer attempts to minimize the
memory used.
 
Search WWH ::




Custom Search