Database Reference
In-Depth Information
Hash Aggregate
A hash aggregate is very similar to a hash join. It is targeted towards large input and requires memory to store the
hash table. The hash aggregate algorithm is shown in Listing 25-12.
Listing 25-12. Hash Aggregate algorithm
for each row R1 from input
begin
calculate hash value of R1 group columns
check for a matching row in hash table
if matching row exists
update aggregate results of matching row
else
insert new row into hash table
end
return all rows from hash table with aggregate results
Similar to a hash join, a hash table can be spilled to tempdb , which negatively affects the performance of
the aggregate.
Comparing Aggregates
As with joins, stream and hash aggregates are targeted towards different use-cases. A stream aggregate works best
with sorted input—either because of existing indexes or when the amount of data is small and can be easily sorted.
A hash aggregate, on the other hand, is targeted towards large unsorted inputs.
Table 25-3 compares hash and stream aggregates.
Table 25-3. Aggregate comparison
Stream Aggregate
Hash Aggregate
Best use-case
Small size input where data can be sorted with the Sort
operator or pre-sorted input.
Medium-to-large unsorted input.
Requires sorted input
Yes
No
Blocking
No. However it often requires a blocking Sort operator.
Yes
Uses memory
No
Yes
Uses tempdb
No
Yes in case of spills
You should consider the cost of the Sort operator during performance tuning if it is used only to support a stream
aggregate pre-requirement. The cost of sorting usually exceeds the cost of stream aggregation itself. You can often
remove it by creating indexes, which would sort the data in the order required for a stream aggregate.
Spools
Spool operators, in a nutshell, are internal in-memory or on-disk caches/temporary tables. SQL Server often uses
spools for performance reasons to cache the results of complex subexpressions that need to be used several times
during query execution.
 
 
Search WWH ::




Custom Search