Getting Started with Impala - Learning Cloudera Impala

Database Reference

In-Depth Information

Note

Depending on the latest version of Impala, requirements might change, so please

visit the Cloudera Impala website for updated information.

Dependency on Hive for Impala

Even though the common perception is that Impala needs Hive to function, it is not

completely true. The fact is that only the Hive metastore is required for Impala to

function and Hive can be installed on some other client machine. Hive doesn't re-

quire being installed on the same DataNode where Impala is installed, because as

long as Impala can access the Hive metastore, it will function as expected. In brief,

the Hive metastore stores tables and partitions' specific information, which is also

called metadata.

As Hive uses PostgreSQL or MySQL for the Hive metastore, we can also consider

that either PostgreSQL or MySQL is required for Impala.

Dependency on Java for Impala

For those who don't know, Impala is written in C++. However, Impala uses Java to

communicate with various Hadoop components. In Impala, the impala-depend-

encies.jar file located at /usr/lib/impala/lib includes all the required Java

dependencies. Oracle JVM is the officially supported JVM for Impala and other JVMs

might cause problems while running Impala.

Hardware dependency

The source datasets processed by Impala, along with join operations, could be very

large, and because processing is done in the memory, as an Impala user you must

make sure that you have sufficient memory to process the join operations. The

memory requirement is based on your source dataset requirement, which you are

going to process through Impala. You also know that Impala cannot run queries

that have a working set greater than the maximum available RAM. In a case when

Search WWH ::

Custom Search

Home