Databases Reference
In-Depth Information
Transparent parallelism.
In leading RDBMs, applications employing
data manipulation statements do not need to change much to execute
in the parallel database. For example, application products for
DB2/6000 need not be totally recompiled. Applications only require a
rebind to the parallel database, thereby generating the least cost-par-
allel plan for several SQL statements as well as storing them.
Query Execution
Query execution may require several logical tasks, where each task may
be accomplished across multiple nodes. Coordinator task operators can
control the run-time of slave tasks. database 2 PE requires interprocess
communication operators resulting from multiple processors. Query exe-
cution is analogous to data flowing on trees of operators divided by tasks,
with sends and receives being used for intertask communication. The
query optimizer picks the optimal join order, the best manner to access
base tables and to compute each join, and the repartitioning strategy to
determine the nodes on which operations need to be completed. (The
inner and outer tables may not be on the same set of nodes.)
Query Optimization
The query optimizer determines the cost of a plan by choosing between
system resources and response time. Query optimization problems are
kept manageable because database 2 PE maintains total system resources
accumulated during the bottom-up generation of a query plan on a per-
node basis. Response time is a measure of the maximum resources used
across all of the nodes and network. Some subsets used to execute a join
include all the nodes on which the inner table is partitioned and the nodes
on which the outer table is partitioned. DB2/6000 query optimization uses
a heuristic to differentiate between parallel join execution strategies. In
complicated queries, the coordinator both returns the answer to the appli-
cation and binds any information needed to compute the answer. DB2 PE
also performs aggregation, including the count function in two steps: The
slave tasks compute local counts and the coordinator sums the counts and
sends the answer to the application.
Interprocess Communication
Query plans or Data Definition Language statements are executed in par-
allel. Run-time software accomplishes this task. Interprocess communica-
tion is provided through:
• Control services, which manage interprocess control message flow.
• Table queue services, which manage exchange of rows between
agents of the parallel edition across or within a node. They are respon-
sible for proper execution of data flow operators connecting different
slave tasks.
Search WWH ::




Custom Search