Declarative Query Processing in Relational Database Systems - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

tuples before passing them to the next operator. Specifically, the Hash Join

operator in the figure passes only matching tuples for which employees earn

more than their managers. The result of this operator is a relation of all en-

gineers who earn more than their managers. This intermediate result is input

to the second Hash Join operator in the figure, which matches employees

and the corresponding departments (the Scan operator over Dept (alias D )

discards all departments with high budgets). This second Hash Join opera-

tor produces the list of all engineers who earn more than their managers and

work in low-budget departments, along with their department information.

This intermediate result is sorted by department cities ( Sort operator) and

passed to a Stream Aggregate operator, which counts the number of tuples

in each group of consecutive tuples with the same department city (leveraging

the sorted property enforced by the previous Sort operator). The output of

the Stream Aggregate operator is a relation of pairs of cities and the num-

ber of engineers earning more than their managers working in a low-budget

department in such cities. This intermediate result is then filtered by the final

Filter operator, which discards the cities with counts smaller than or equal

to 10, producing the desired answer.

We make some additional observations concerning execution plans and eval-

uation strategies for SQL queries. First, we note that there are several possible

implementations of operators. For instance, Hash Joins are just one of many

alternatives to process relational joins. Other operators can be applicable in

some scenarios only (e.g., Merge Joins require that inputs are sorted by the

join columns). Second, some operators pipeline intermediate results (e.g., Scan

operators pass each tuple up to the next operator without first reading the

whole table), while others need to buffer/materialize the whole input before

producing tuples (e.g., the Sort operator). Third, operators are very sophis-

ticated and go considerably beyond our simple functions in Figure 1.1. For

instance, Hash Joins can adapt to skewed data, gracefully spill to disk if the

input data are larger than the available memory, and leverage sophisticated

optimizations (e.g., bitmap filters). Finally, a factor that usually makes a big

difference in performance is the presence of relevant indexes, described next.

1.3.1 Database Indexes

Indexes are a very important feature in relational engines, which can dras-

tically improve query processing performance. An index is defined over a

sequence of columns of a table and results in an access path for obtaining

tuples satisfying certain predicates. Consider an index defined over table Emp

on column EId . We denote such index as Emp(EId) . The most popular imple-

mentation for indexes, which we use in this topic, uses B + -trees , which ensure

ecient insertion, retrieval, and removal of records identified by a key. A B + -

tree is a dynamic, multilevel tree, where each node is associated with a range

of values for the index columns, and represents all tuples in the index table

that lie within such range. In a B + -tree, in contrast to a B-tree, all records

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home