Database Reference
In-Depth Information
7.4.3 Eciency Advantages of Using the BAT Algebra
The main advantage of the BAT Algebra is its hard-coded semantics, caus-
ing all operators to be predicateless. For comparison, in relational algebra,
the join and select operators take an arbitrary Boolean column expression
that determines which tuples must be joined and selected. The fact that this
Boolean expression is arbitrary, and specified at query time only, means that
the RDBMS must include some expression interpreter in the critical runtime
code-path of the join and select operators. Such predicates do not occur in
BAT Algebra; therefore, we also say it has a “zero degree of freedom.” For
instance, the hard-coded semantics of join (L,R) is that the predicate is a sim-
ple equality between the inner columns of the left BAT, L, and right BAT, R;
and its output is the outer columns for the matching tuples. In case of select,
the predicate is equality on the tail column. This absence of freedom allows
the implementation of the query algebra to forsake an expression interpreting
engine; rather, all BAT algebra operations in the implementation map onto
array operations. For instance, the expression “select(bat[TID,int] B, int V):
bat[TID,TID] R” in BAT Algebra can be represented at the C-level code as
something like:
for( i=j=0; i < n; i++)
if (B. tail [ i ] == V) R.tail[i] = j++;
Note the following in the “select” BAT statement above:
The select operator has two parameters B and V and one result R.
B is a BAT with a head-column of type TID, and a tail-column of type int.
V is a constant (a single) value of type int.
R is a BAT with both head and tail column of type TID, where the head
represents a surrogate sequence (
=
...
) and the tail contains the
qualifying row-IDs of the rows that matched the int
0,1,2,
=
V condition (in
the example shown in Figure 7.2, these are rows 1,2).
Such simple loops are amenable to compiler optimization and CPU out-
of-order speculation, which lead to high performance. The philosophy behind
BAT Algebra can be paraphrased as “the RISC approach to database query
languages”: By making the algebra simple, the opportunities are created for
implementations that execute the common case very fast.
Note that the above code is only correct if the head column of a BAT,
B, contains a densely ascending TID (tuple identifier, that is, surrogate) se-
quence starting with 0 (that is, B.head = 0,1,2,
). This happens to be a
common case, and MonetDB recognizes this as the dense property. Dense TID
columns are in fact not stored at all in the implementation, as they are the
same as the array index in the column. Many head columns are dense, so Mon-
etDB BAT processing often equates to simple array processing. In addition
to denseness, MonetDB keeps a series of other runtime properties on columns
...
Search WWH ::




Custom Search