Impala Administration and Performance Improvements - Learning Cloudera Impala

Database Reference

In-Depth Information

Single point of failure in Impala

The best way to start this section is that there is no single point of failure in Impala,

meaning every and all Impala daemons are capable of executing incoming queries.

A specific node failure will impact only those query segments that were distributed on

the affected machine because one single query is distributed across multiple nodes.

In this situation, re-execution of the same query will allow the system to recover from

the problem. For Hadoop cluster stability, it is suggested to run various Impala com-

ponents on DataNode. Running Impala on NameNode is not suggested because in

an unfortunate event, Impala on NameNode could cause overall NameNode failure,

which ultimately could impact Hadoop cluster stability. Running Impala on DataNode

means as long as the Hadoop cluster is up and running smoothly, the Impala cluster

will function well, even if there is an issue with failure of a single or a few DataNodes.

Also, if NameNode is highly available, the Impala cluster will be highly available as

well.

One thing to remember on the same account is that Impala has dependency on

statestore, which runs only on a single machine. If statestore is not available, it will

not bring Impala to a complete shutdown; however, it does impact its operation and

query distribution.

Search WWH ::

Custom Search

Home