Database Reference
In-Depth Information
tuples before passing them to the next operator. Specifically, the Hash Join
operator in the figure passes only matching tuples for which employees earn
more than their managers. The result of this operator is a relation of all en-
gineers who earn more than their managers. This intermediate result is input
to the second Hash Join operator in the figure, which matches employees
and the corresponding departments (the Scan operator over Dept (alias D )
discards all departments with high budgets). This second Hash Join opera-
tor produces the list of all engineers who earn more than their managers and
work in low-budget departments, along with their department information.
This intermediate result is sorted by department cities ( Sort operator) and
passed to a Stream Aggregate operator, which counts the number of tuples
in each group of consecutive tuples with the same department city (leveraging
the sorted property enforced by the previous Sort operator). The output of
the Stream Aggregate operator is a relation of pairs of cities and the num-
ber of engineers earning more than their managers working in a low-budget
department in such cities. This intermediate result is then filtered by the final
Filter operator, which discards the cities with counts smaller than or equal
to 10, producing the desired answer.
We make some additional observations concerning execution plans and eval-
uation strategies for SQL queries. First, we note that there are several possible
implementations of operators. For instance, Hash Joins are just one of many
alternatives to process relational joins. Other operators can be applicable in
some scenarios only (e.g., Merge Joins require that inputs are sorted by the
join columns). Second, some operators pipeline intermediate results (e.g., Scan
operators pass each tuple up to the next operator without first reading the
whole table), while others need to buffer/materialize the whole input before
producing tuples (e.g., the Sort operator). Third, operators are very sophis-
ticated and go considerably beyond our simple functions in Figure 1.1. For
instance, Hash Joins can adapt to skewed data, gracefully spill to disk if the
input data are larger than the available memory, and leverage sophisticated
optimizations (e.g., bitmap filters). Finally, a factor that usually makes a big
difference in performance is the presence of relevant indexes, described next.
1.3.1 Database Indexes
Indexes are a very important feature in relational engines, which can dras-
tically improve query processing performance. An index is defined over a
sequence of columns of a table and results in an access path for obtaining
tuples satisfying certain predicates. Consider an index defined over table Emp
on column EId . We denote such index as Emp(EId) . The most popular imple-
mentation for indexes, which we use in this topic, uses B + -trees , which ensure
ecient insertion, retrieval, and removal of records identified by a key. A B + -
tree is a dynamic, multilevel tree, where each node is associated with a range
of values for the index columns, and represents all tuples in the index table
that lie within such range. In a B + -tree, in contrast to a B-tree, all records
Search WWH ::




Custom Search