RELATIONAL DATA RETRIEVAL: SQL - Fundamentals of Database Management Systems

Databases Reference

In-Depth Information

RELATIONAL QUERY OPTIMIZER

Relational DBMS Performance

An ever-present issue in data retrieval is performance: the speed with which

the required data can be retrieved. In a typical relational database application

environment, and as we've seen in the examples above, many queries require only

one table. It is certainly reasonable to assume that such single-table queries using

indexes, hashing, and the like, should, more or less, not take any longer in a relational

database system environment than in any other kind of file management system.

But,what about the queries that involve joins? Recall the detailed explanation of how

data integration works earlier in the topic that used the Salesperson and Customer

tables as an example. These very small tables did not pose much of a performance

issue, even if the join was carried out in the worst-case way, comparing every row

of one table to every row of the other table, as was previously described. But what

if we attempted to join a 1-million-row table with a 3-million-row table? How long

do you think that would take—even on a large, fast computer? It might well take

much longer than a person waiting for a response at a workstation would be willing

to tolerate. This was actually one of the issues that caused the delay of almost ten

years from the time the first article on relational database was published in 1970

until relational DBMSs were first offered commercially almost ten years later.

The performance issue in relational database management has been approached

in two different ways. One, the tuning of the database structure, which is known

as ''physical database design,'' will be the subject of an entire chapter of this

book, Chapter 8. It's that important. The other way that the relational database

performance issue has been approached is through highly specialized software in

the relational DBMS itself. This software, known as a relational query optimizer ,

is in effect an ''expert system'' that evaluates each SQL SELECT statement sent to

the DBMS and determines an efficient way to satisfy it.

Relational Query Optimizer Concepts

All major SQL processors (meaning all major relational DBMSs) include a query

optimizer. Using a query optimizer, SQL attempts to figure out the most efficient

way of answering a query, before actually responding to it. Clearly, a query that

involves only one table should be evaluated to take advantage of aids such as indexes

on pertinent attributes. But, again, the most compelling and interesting reason for

having a query optimizer in a relational database system is the goal of executing

multiple-table data integration or join-type operations without having to go through

the worst-case, very time-consuming, exhaustive row-comparison process. Exactly

how a specific relational DBMS's query optimizer works is typically a closely

held trade secret. Retrieval performance is one way in which the vendors of these

products compete with one another. Nevertheless, there are some basic ideas that

we can discuss here.

When an SQL query optimizer is presented with a new SELECT statement to

evaluate, it seeks out information about the tables named in the FROM clause. This

information includes:

Which attributes of the tables have indexes built over them.

■

Which attributes have unique values.

■

How many rows each table has.

■

Fundamentals of Database Management Systems

Search WWH ::

Custom Search

Home