Database Reference
In-Depth Information
Why Impala is faster than Hive in query
processing
We have mentioned many times in this topic that Impala is a very fast distributed data-
processing framework, so you might want to know how Impala achieves such speed
or what is behind Impala that makes it so fast. I would answer this question by provid-
ing the following key points:
• While processing SQL-like queries, Impala does not write intermediate results
on disk; instead full SQL processing is done in memory, which makes it faster.
• With Impala, the query starts its execution instantly compared to MapReduce,
which may take significant time to start processing larger SQL queries and this
adds more time in processing.
Impala Query Planner uses smart algorithms to execute queries in multiple
stages in parallel nodes to provide results faster, avoiding sorting and shuffle
steps, which may be unnecessary in most of the cases.
• Impala has information about each data block in HDFS, so when processing
the query, it takes advantage of this knowledge to distribute queries more
evenly in all DataNodes.
• Another key reason for fast performance is that Impala first generates
assembly-level code for each query. The assembly code executes faster than
any other code framework because while Impala queries are running natively
in memory, having a framework will add additional delay in the execution due
to the framework overhead.
Search WWH ::




Custom Search