Databases Reference
In-Depth Information
RELATIONAL QUERY OPTIMIZER
Relational DBMS Performance
An ever-present issue in data retrieval is performance: the speed with which
the required data can be retrieved. In a typical relational database application
environment, and as we've seen in the examples above, many queries require only
one table. It is certainly reasonable to assume that such single-table queries using
indexes, hashing, and the like, should, more or less, not take any longer in a relational
database system environment than in any other kind of file management system.
But,what about the queries that involve joins? Recall the detailed explanation of how
data integration works earlier in the topic that used the Salesperson and Customer
tables as an example. These very small tables did not pose much of a performance
issue, even if the join was carried out in the worst-case way, comparing every row
of one table to every row of the other table, as was previously described. But what
if we attempted to join a 1-million-row table with a 3-million-row table? How long
do you think that would take—even on a large, fast computer? It might well take
much longer than a person waiting for a response at a workstation would be willing
to tolerate. This was actually one of the issues that caused the delay of almost ten
years from the time the first article on relational database was published in 1970
until relational DBMSs were first offered commercially almost ten years later.
The performance issue in relational database management has been approached
in two different ways. One, the tuning of the database structure, which is known
as ''physical database design,'' will be the subject of an entire chapter of this
book, Chapter 8. It's that important. The other way that the relational database
performance issue has been approached is through highly specialized software in
the relational DBMS itself. This software, known as a relational query optimizer ,
is in effect an ''expert system'' that evaluates each SQL SELECT statement sent to
the DBMS and determines an efficient way to satisfy it.
Relational Query Optimizer Concepts
All major SQL processors (meaning all major relational DBMSs) include a query
optimizer. Using a query optimizer, SQL attempts to figure out the most efficient
way of answering a query, before actually responding to it. Clearly, a query that
involves only one table should be evaluated to take advantage of aids such as indexes
on pertinent attributes. But, again, the most compelling and interesting reason for
having a query optimizer in a relational database system is the goal of executing
multiple-table data integration or join-type operations without having to go through
the worst-case, very time-consuming, exhaustive row-comparison process. Exactly
how a specific relational DBMS's query optimizer works is typically a closely
held trade secret. Retrieval performance is one way in which the vendors of these
products compete with one another. Nevertheless, there are some basic ideas that
we can discuss here.
When an SQL query optimizer is presented with a new SELECT statement to
evaluate, it seeks out information about the tables named in the FROM clause. This
information includes:
Which attributes of the tables have indexes built over them.
Which attributes have unique values.
How many rows each table has.
Search WWH ::




Custom Search