Databases Reference
In-Depth Information
Map Reduce
Web UI
Hive CLI
JDBC
TSOperator
User Script
Browse, Query, DDL
UDF/UDAF
substr
sum
average
SELOperator
MetaStore
Hive QL
FSOperator
Parser
Thrift API
ExecMapper/ExecReducer
Plan
SerDe
Input/OutputFormat
Optimizer
Task
HDFS
StorageHandler
RcFile
DB
HBase
FIGURE 4.18
Hive process flow.
Source: HUG Discussions.
Semantic analyzer—in this stage the compiler builds a logical plan based on the information
that is provided by the metastore on the input and output tables. Additionally, the complier also
checks type compatibilities in expressions and flags compile-time semantic errors at this stage.
The next step is the transformation of an AST to intermediate representation that is called the
query block (QB) tree. Nested queries are converted into parent-child relationships in a QB
tree during this stage.
Logical plan generator—in this stage the compiler writes the logical plan from the semantic
analyzer into a logical tree of operations.
Optimization—this is the most involved phase of the complier as the entire series of DAG
optimizations are implemented in this phase. There are several customizations that can be done
to the complier if desired. The primary operations done at this stage are as follows:
- Logical optimization—perform multiple passes over the logical plan and rewrites in several
ways.
- Column pruning—this optimization step ensures that only the columns that are needed in
the query processing are actually projected out of the row.
- Predicate pushdown—predicates are pushed down to the scan if possible so that rows can
be filtered early in the processing.
- Partition pruning—predicates on partitioned columns are used to prune out files of
partitions that do not satisfy the predicate.
- Join optimization.
- Grouping and regrouping.
- Repartitioning.
 
Search WWH ::




Custom Search