Database Reference
In-Depth Information
Impala and Hive
In this topic, we have always emphasized that Impala uses the Hive metastore as a
catalog only. While Hive uses MapReduce to process its queries, MapReduce takes
charge of distributing the queries and then returning results back to Hive. Impala uses
its own daemons running on one or many or all DataNodes and performs query pro-
cess tasks. There are a few key topics where Impala and Hive are very different, and
I have noted some of them in the following section.
Key differences between Impala and Hive
• Impala performs in-memory query processing while Hive does not
• Hive use MapReduce to process queries, while Impala uses its own process-
ing engine
• Hive can be extended using User Defined Functions ( UDF ) or writing a cus-
tom Serializer/Deserializer ( SerDes ); however, Impala does not support ex-
tensibility as Hive does for now
• Impala depends on Hive to function, while Hive does not depend on any other
application and just needs the core Hadoop platform (HDFS and MapReduce)
• Impala queries are subsets of HiveQL, which means that almost every Impala
query (with a few limitation) can run in Hive. But vice-versa is not true because
some of the HiveQL features supported in Hive are not supported in Impala
Search WWH ::




Custom Search