Database Reference
In-Depth Information
Note
Depending on the latest version of Impala, requirements might change, so please
visit the Cloudera Impala website for updated information.
Dependency on Hive for Impala
Even though the common perception is that Impala needs Hive to function, it is not
completely true. The fact is that only the Hive metastore is required for Impala to
function and Hive can be installed on some other client machine. Hive doesn't re-
quire being installed on the same DataNode where Impala is installed, because as
long as Impala can access the Hive metastore, it will function as expected. In brief,
the Hive metastore stores tables and partitions' specific information, which is also
called metadata.
As Hive uses PostgreSQL or MySQL for the Hive metastore, we can also consider
that either PostgreSQL or MySQL is required for Impala.
Dependency on Java for Impala
For those who don't know, Impala is written in C++. However, Impala uses Java to
communicate with various Hadoop components. In Impala, the impala-depend-
encies.jar file located at /usr/lib/impala/lib includes all the required Java
dependencies. Oracle JVM is the officially supported JVM for Impala and other JVMs
might cause problems while running Impala.
Hardware dependency
The source datasets processed by Impala, along with join operations, could be very
large, and because processing is done in the memory, as an Impala user you must
make sure that you have sufficient memory to process the join operations. The
memory requirement is based on your source dataset requirement, which you are
going to process through Impala. You also know that Impala cannot run queries
that have a working set greater than the maximum available RAM. In a case when
Search WWH ::




Custom Search