Impala Administration and Performance Improvements - Learning Cloudera Impala

Database Reference

In-Depth Information

Summary

In this chapter, we have covered Impala administration and performance improvement

using various methods including Cloudera Manager. We discussed Impala High Avail-

ability, which mainly depends on Hadoop NameNode High Availability. We studied

methods such as enabling block location tracking, native checksumming, and short-

circuit read, that help us read data quickly in the Hadoop cluster to improve Impala

performance. We also discussed how various types of file and compression formats

help us to improve performance and, if not chosen wisely, the file format or compres-

sion could drag down the data processing performance. We also discussed gaining

higher query execution performance by modifying the query in such as way that its

processing is expedited. As most of these topics require a great deal of background

information, having them here in this topic as a reference will definitely help you to

understand them and use them to improve your Impala cluster performance.

The next chapter is all about troubleshooting Impala when experiencing problems. We

will extend our knowledge by learning how to find the root cause of various problems

in the Impala cluster and resolve them quickly.

Search WWH ::

Custom Search

Home