Indexing for Table Joins - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

Building temporary tables for Customers and

Invoices

0ms

Sorting customers

1000 × 0 . 01 ms = 0 . 01 s

Sorting invoices

20 , 000 × 0 . 01 ms = 0 . 2s

Merging

( 1000 + 20 , 000 ) × 0 . 01 ms = 0 . 2s

or to keep things simple:

Sort and Merge : 2

(

1000

+

20

,

000

) ×

0

.

01 ms

=

0

.

4s

Fetch 1000

×

20

=

20

,

000

×

0

.

1ms

=

2s

Local Response Time

10 s

+

200 s

+

2s

=

3

.

5min

With DB2 for z/OS, if the outer table had been accessed in the join column

sequence, there would be no need to create and sort a temporary table. The

inner table, however, is always put into a work file. The merge would then

take place, with the inner table's temporary table, as the outer rows were being

accessed.

With SQL Server, the merge join can be a regular or a many-to-many oper-

ation. The latter uses temporary tables to store the rows. If there are duplicate

rows from each input, one of the inputs rewinds to the start of the duplicates as

each duplicate from the other input is processed.

Hash Joins

As we have mentioned earlier, Oracle, SQL Server, and DB2 for LUW tend

to choose hash join (HJ) instead of merge scan (MS). Hash join is basically

merge scan with hashing instead of sorting; the smaller row set is first stored in

a temporary table, hashed by (a calculation performed on) the join column(s).

Then the other table (or index slice) is scanned and for each row that satisfies

the local predicates, the temporary table is checked for matching rows, using the

hash (computed) value.

Merge scan is faster if the row sets are presorted in the indexes. If merge

scan needs a sort, hash join tends to be faster, particularly if one of the row

sets fits in memory (a random touch to a row in a small hash table in mem-

ory normally takes less CPU time than sorting and merging a row). If both

row sets are large, relative to the available memory, the hash table is parti-

tioned (one partition at a time in memory), and the other row set is scanned

several times.

With regard to the QUBE, the initial table or index scans will be identical,

regardless of whether merge scan or hash join is being used. Any reduced cost

resulting from hashing will be difficult to quantify and so, to avoid unnecessary

complexity, we will determine the cost of the hashing process in the same way

as we did for MS.

Relational Database Index Design and the Optimizers

Search WWH ::

Custom Search

Home