Database Reference
In-Depth Information
4. Restart all DataNodes.
Adding more Impala nodes to achieve higher
performance
It is a fact that Impala performance improves if more nodes are added to the cluster.
In the same way, Hadoop performance improves by adding more DataNodes and
TaskTrackers. Having more nodes in the Hadoop cluster will distribute the data to
more clusters, and queries will have more distribution, which ultimately will return
higher performance.
Optimizing
memory
usage
during
query
execution
You can improve query performance by restricting the amount of memory consumed
by a query during its execution and you can do that by setting the -mem_limits flag
when starting Impala daemon. This flag will restrict the memory consumed only by a
query; however, there is still memory available for starting Impala to cache metadata
and perform other startup actions.
Query execution dependency on memory
You might wonder about memory limitation impact on query execution as Impala
has a strong dependency on available memory. If dataset size exceeds the available
memory in a machine, the query will fail. The memory usages in Impala are not dir-
ectly based on the input dataset size; instead it varies depending on types of query.
An aggregation will require memory equivalent to the number of rows after grouping;
however, join queries require memory equivalent to the combined size of remaining
tables excluding the biggest table.
Using resource isolation
If you are using Cloudera Manager, you have the ability to implement resource isol-
ation using the cgroups mechanism and it can be achieved by configuring Cloudera
Manager. For more information, please read the Cloudera Manager documentation
on resource isolation.
Search WWH ::




Custom Search