Database Reference
In-Depth Information
Row batch
Column Vectors
ProductKey
CustomerKey
...
Fig. 13.6 Row batch query processing
stored in a BLOB (binary large object). There is also a segment directory
allowing to quickly find all segments of a given column. In addition, the
directory contains metadata, like number of rows, minimum and maximum
values, and so on.
The main element in batch processing is the row batch (see Fig. 13.6 ),
an object that contains about one thousand rows. Each column within the
batch is represented internally as a vector of fixed size elements. There is
an additional vector denoted qualifying rows bitmap vector used as follows.
For example, to evaluate a condition such as ProductKey < 1, we need to
scan the column ProductKey in the batch, perform the comparison, and,
for each qualifying element, set the corresponding bit in the qualifying
rows vector. Ecient vector-based algorithms reduce the CPU overhead of
database operations. It is reported that this reduction can be of up to forty
times compared with row-based processing methods.
It is worth remarking that SQL Server column-store indexes and column-
based query processing are optimized for typical queries in data warehouses
with a large fact table and small- to medium-sized dimension tables, following
a star schema configuration. Since these queries include a star join, selection
predicates over dimension attributes, and a final aggregation, they typically
return a small result set. However, when the result set is large (e.g., if data are
not aggregated or there is no join or filtering), performance may be poor since
batch processing is not applied, and the benefit from the column-store index
is just due to compression and the scanning of fewer columns. Performance
may decrease when (a) two large tables are joined so that they require large
hash tables that do not fit into memory and must be dumped to disk; (b)
many columns are returned, and thus most of the column-store index must be
retrieved; and (c) a join condition over a column-store indexed table includes
more than one column.
In SQL Server, a table over which a column-store index has been defined
cannot be updated. To overcome this problem, some ad hoc techniques can
 
Search WWH ::




Custom Search