Database Reference
In-Depth Information
and the other containing the position index. At the ROS, partitioning and
segmentation are applied to facilitate parallelism. The former, also called
intra-node partitioning, splits data horizontally, based on data values, for
example, by date intervals. Segmentation (also called internode partitioning)
splits data across nodes according to a hash key. When the WOS is full, data
are moved to the ROS by a moveout function. To save space in the ROS,
a mergeout function is applied (this is analogous to the merge operation in
Fig. 13.4 ).
Finally, although inserts, deletes, and updates are supported, Vertica
may not be appropriate for update-intensive applications, like heavy OLTP
workloads that, roughly speaking, exceeds 10% of the total load.
13.5.2 MonetDB
MonetDB 3 is a column-store IMDBS developed at the Centrum Wiskunde
& Informatica (CWI) 4 in the Netherlands. The main characteristics of
MonetDB are a columnar storage; a bulk query algebra, which allows fast
implementation on modern hardware; cache-conscious algorithms; and new
cost models, which account for the cost of memory access.
Usually, in RDBMS query processing, when executing a query plan we
typically need to scan a relation R and filter it using a condition φ .The
format of R is only known at query time; thus, an expression interpreter is
needed. The idea of MonetDB is based on the fact that the CPU is basically
used to analyze the query expression; thus, processing costs can be reduced
by optimizing CPU usage. To simplify query interpretation, the relational
algebra was replaced by a simpler algebra.
MonetDB also uses vertical partitioning , where each database column
is stored in a so-called binary association table (BAT). A BAT is a two-
column table where the left column is called the head (actually an object
identifier) and the right column the tail (the column value). The query
language of MonetDB is a column algebra called MIL (Monet Interpreter
Language). The parameters of the operators have a fixed format: they are
two-column tables or constants. The expression calculated by an operator is
also fixed, as well as the format of the result.
However, performance is not optimal since each operation consumes
materialized BATs and produces a materialized BAT. Therefore, on the
one hand, since it uses a column-at-a-time evaluation technique, MIL does
not have the problem of spending 90% of its query execution time in a
tuple-at-a-time interpretation overhead, like in traditional RDBMSs, because
calculations work on entire BATs, and the layout of these arrays is known at
3 http://www.monetdb.org/
4 http://www.cwi.nl/
Search WWH ::




Custom Search