Databases Reference
In-Depth Information
How it works...
As discussed in the Introduction section, the Hash Join works with heavy data that is not
sorted on predicate column. So obviously when we execute our first SELECT query in step 1, it
shows Hash Join operator in the Execution plan as a high volume of data is generated, which
is not sorted or indexed.
In step 3, we have created unique clustered index on key fields in both the tables. So
obviously our data will be sorted physically as well as indexed in table. So when we execute
same SELECT query in step 5, we will see a Merge Join operator rather than a Hash Join, as
our data is now sorted and we also have equijoin operator.
In step 6 we have provided predicate in WHERE condition, so dataset from
SalesOrderHeaderDemo table becomes smaller than it used to be in step 3.
As dataset becomes small with sorted data, it performs as outer loop along with
inner loop of SalesOrderDetailDemo table to perform a Nested Loop Join.
A Nested Loop Join works well with at least two result sets and out of those, one is a relatively
small dataset which is used as an outer loop input and another result set with an efficient
index works as an inner loop set. It fetches each record from the left-hand side dataset
and loops through the second dataset to find a match, so we have reduced the dataset by
providing a WHERE clause and the Nested Loop comes into the picture.
As each join operator has its own pros and cons, no single one is the "best" or "worst" for all
situation. It depends on the task we are performing. Quite a few times I have been asked why
the Hash Join is there in SQL Server as it is consuming lots of CPU time?
I always answer that a Hash is not bad, it is good for situations where we have heavy datasets
that are not sorted or indexed. If it is possible in your environment, try to make a unique
clustered index on each table so that you can meet with a Merge Join operator. If it is not
possible, never try to advise optimizer to use a Merge or a Nested Loop by providing an
OPTION query hint, as it may degrade the performance. A Nested Loop works best only
with a small dataset, as described in this recipe.
There's more...
SQL Server chooses best physical operator for your join, but there is one
option called OPTION clause which helps you to change SQL Server 2012's
decision with your preferred way.
 
Search WWH ::




Custom Search