Advanced Impala Concepts - Learning Cloudera Impala

Database Reference

In-Depth Information

Impala and Hive

In this topic, we have always emphasized that Impala uses the Hive metastore as a

catalog only. While Hive uses MapReduce to process its queries, MapReduce takes

charge of distributing the queries and then returning results back to Hive. Impala uses

its own daemons running on one or many or all DataNodes and performs query pro-

cess tasks. There are a few key topics where Impala and Hive are very different, and

I have noted some of them in the following section.

Key differences between Impala and Hive

• Impala performs in-memory query processing while Hive does not

• Hive use MapReduce to process queries, while Impala uses its own process-

ing engine

• Hive can be extended using User Defined Functions ( UDF ) or writing a cus-

tom Serializer/Deserializer ( SerDes ); however, Impala does not support ex-

tensibility as Hive does for now

• Impala depends on Hive to function, while Hive does not depend on any other

application and just needs the core Hadoop platform (HDFS and MapReduce)

• Impala queries are subsets of HiveQL, which means that almost every Impala

query (with a few limitation) can run in Hive. But vice-versa is not true because

some of the HiveQL features supported in Hive are not supported in Impala

Search WWH ::

Custom Search

Home