Database Reference
In-Depth Information
Single point of failure in Impala
The best way to start this section is that there is no single point of failure in Impala,
meaning every and all Impala daemons are capable of executing incoming queries.
A specific node failure will impact only those query segments that were distributed on
the affected machine because one single query is distributed across multiple nodes.
In this situation, re-execution of the same query will allow the system to recover from
the problem. For Hadoop cluster stability, it is suggested to run various Impala com-
ponents on DataNode. Running Impala on NameNode is not suggested because in
an unfortunate event, Impala on NameNode could cause overall NameNode failure,
which ultimately could impact Hadoop cluster stability. Running Impala on DataNode
means as long as the Hadoop cluster is up and running smoothly, the Impala cluster
will function well, even if there is an issue with failure of a single or a few DataNodes.
Also, if NameNode is highly available, the Impala cluster will be highly available as
well.
One thing to remember on the same account is that Impala has dependency on
statestore, which runs only on a single machine. If statestore is not available, it will
not bring Impala to a complete shutdown; however, it does impact its operation and
query distribution.
Search WWH ::




Custom Search