Indexing for Table Joins - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

joins . It is more difficult to design good indexes for a join than for a single-table

SELECT because the join method and the table access order have a great impact

on index requirements. The methods discussed in previous chapters, however,

are still applicable to joins but with additional considerations.

In a join SELECT there are two kinds of predicates, local predicates that

access a single table and join predicates that define the connection between

tables. Most optimizers choose the join method and table access order by esti-

mating the local response time of numerous alternatives. The most common join

method is the nested-loop join (SQL Server: loop join ), although merge scan join

(Oracle: sort-merge join ;SQLServer: merge join )and hash join are becoming

more popular; this is because sequential read is much faster than it used to be.

In a nested-loop join, the DBMS first finds a row in the outer table that satisfies

the local predicates referring to that table. Then it looks for the related rows

in the next table, the inner table , and checks which of these satisfy their local

predicates—and so on.

A two-table join could be replaced by two single-table cursors and two nested

loops coded in the program. It is then the programmer who makes the decision

about the table access order—which table is accessed in the outer loop. When a

join is used, the optimizer makes that decision, although this may be overridden

by hints or other tricks.

Join predicates are most often based on primary key = foreign key . Assum-

ing the correct foreign key indexes (starting with the foreign key columns) have

been created and the result table is not exceptionally large, a nested loop is likely

to be the fastest method, at least if all local predicates refer to one table.

If a nested-loop join is not appropriate, a merge scan or hash join may be

faster; with the former, one or more tables are sorted into a consistent order,

if necessary (after local predicates have been applied), and the qualifying rows

in the tables or work files are merged; the hash join is basically a merge scan

with hashing instead of sorting—more details later. With these methods, no table

page is accessed more than once. Oracle, SQL Server, and DB2 for LUW tend

to choose hash join instead of merge scan.

DB2 for z/OS does not use a hash join; in some less common situations a

hybrid join — essentially a nested loop with list prefetch—may be preferred.

Note

ž In this chapter we will initially concentrate on the nested-loop join and

then compare this with the merge scan/hash join.

ž Where data is required from more than one table, the column names in

this chapter are prefixed with the table names to show from which table

or index the columns are to be found.

TWO SIMPLE JOINS

We will first analyze two simple join examples to show the processes involved

in a nested-loop join and how we can calculate the cost of the joins.

Search WWH ::

Custom Search

Home