Database Reference
In-Depth Information
Building temporary tables for Customers and
Invoices
0ms
Sorting customers
1000 × 0 . 01 ms = 0 . 01 s
Sorting invoices
20 , 000 × 0 . 01 ms = 0 . 2s
Merging
( 1000 + 20 , 000 ) × 0 . 01 ms = 0 . 2s
or to keep things simple:
Sort and Merge : 2
(
1000
+
20
,
000
) ×
0
.
01 ms
=
0
.
4s
Fetch 1000
×
20
=
20
,
000
×
0
.
1ms
=
2s
Local Response Time
10 s
+
200 s
+
2s
=
3
.
5min
With DB2 for z/OS, if the outer table had been accessed in the join column
sequence, there would be no need to create and sort a temporary table. The
inner table, however, is always put into a work file. The merge would then
take place, with the inner table's temporary table, as the outer rows were being
accessed.
With SQL Server, the merge join can be a regular or a many-to-many oper-
ation. The latter uses temporary tables to store the rows. If there are duplicate
rows from each input, one of the inputs rewinds to the start of the duplicates as
each duplicate from the other input is processed.
Hash Joins
As we have mentioned earlier, Oracle, SQL Server, and DB2 for LUW tend
to choose hash join (HJ) instead of merge scan (MS). Hash join is basically
merge scan with hashing instead of sorting; the smaller row set is first stored in
a temporary table, hashed by (a calculation performed on) the join column(s).
Then the other table (or index slice) is scanned and for each row that satisfies
the local predicates, the temporary table is checked for matching rows, using the
hash (computed) value.
Merge scan is faster if the row sets are presorted in the indexes. If merge
scan needs a sort, hash join tends to be faster, particularly if one of the row
sets fits in memory (a random touch to a row in a small hash table in mem-
ory normally takes less CPU time than sorting and merging a row). If both
row sets are large, relative to the available memory, the hash table is parti-
tioned (one partition at a time in memory), and the other row set is scanned
several times.
With regard to the QUBE, the initial table or index scans will be identical,
regardless of whether merge scan or hash join is being used. Any reduced cost
resulting from hashing will be difficult to quantify and so, to avoid unnecessary
complexity, we will determine the cost of the hashing process in the same way
as we did for MS.
Search WWH ::




Custom Search