Database Reference
In-Depth Information
sidering a parallel environment, this means that
it is worth to partition the central fact table into
multiple pieces that can be processed in parallel,
while the dimension tables are left complete. In a
shared-nothing environment, this means copying
dimension relations into all nodes and dividing the
fact throughout the nodes. This approach is very
useful in what concerns parallel processing, since
most operations can proceed in parallel (Furtado
2005), including processing of joins in parallel.
Works such as (Furtado 2007, Rao et al. 2002)
deal with more complex schemas that may include
bigger dimensions and multiple interconnected
big relations that, for best performance, need to
be partitioned. In those cases, simultaneous par-
titioning of multiple relations may pose problems
to the parallel join operator. As a simple rule, if
at most two of the relations participating in a join
are partitioned, they can be partitioned by their
common join key (equi-partition), and parallel join
processing can proceed independently in all nodes;
If, on the other hand, more than two relations need
to be partitioned, then substantial exchanges of
data may be necessary to process broadcast and
redirected joins, as described in (Rao et al. 2002).
When redirection is necessary, workload-based
partitioning solutions as the ones referenced in
(Furtado 2007, Rao et al. 2002) can be adopted,
which try to co-locate the relations that result in
highest query processing gains.
The IBM DB2 server (IBM 2008) allows the
definition of any number of partitions (up to some
limit) for dividing tables into a set of nodes. In
that system parallel joins are processed in differ-
ent manners, depending on the co-location status
of joining relations: In co-located joins, the data
is already partitioned according to the join hash
values, so the join can occur in parallel with no
data exchange requirements between nodes; In
redirected joins the data is not co-located, but
it is enough to re-locate the rows from one of
the source data sets in order to proceed with a
co-located join; In repartitioned join both source
data sets need to be re-located in order to become
co-located; In broadcast join one of the source data
is broadcasted into all nodes to enable parallel
joining with the other partitioned data set.
In this work we assume schemas and workloads
that do not require redirection, that is, simple
schemas such as the one in Figure 1, where there
is a large fact and small dimensions, or schemas
with at most two equi-partitioned relations (par-
titioned by the same key attribute), assuming that
the remaining relations are replicated. Future
work includes extending the current rationale of
the ChunkSim simulator to deal with any generic
schema.
Basic Query Processing
Given a data allocation in a shared-nothing envi-
ronment, the next important issue is how queries
are to be processed. Query plans are determined
by a cost-based parallel query optimizer, which
evaluates alternative query processing paths and
chooses the most efficient, according to a cost
model. The result of the optimization must then
be re-written as a set of queries and data transfer
operations. In this section we illustrate how queries
can be decomposed into a number of smaller sub-
queries that will act on fragments independently
in a SN architecture, with significant speedup.
This subject has been discussed in several works
(Akinde et al. 2003, Furtado 2005, Stohr 2000),
our illustration being based on the parallel query
processing approach followed by the Data Ware-
house Parallel Architecture (DWPA) (Furtado
2005, Furtado 2007).
Figure 2 illustrates the basic architecture of the
shared-nothing, node partitioned data warehouse.
It includes three major entities: Query Submitter,
Executor and the Controller. Submitters are simple
services that may reside in any computer, do not
require an underlying database server and submit
queries to the system. Once submitted, the query is
parsed and transformed into high-level actions by a
query planner. These actions are then transformed
into Command Lists for each Executor. Executors
Search WWH ::




Custom Search