Database Reference
In-Depth Information
Summary
In this chapter, we have covered Impala administration and performance improvement
using various methods including Cloudera Manager. We discussed Impala High Avail-
ability, which mainly depends on Hadoop NameNode High Availability. We studied
methods such as enabling block location tracking, native checksumming, and short-
circuit read, that help us read data quickly in the Hadoop cluster to improve Impala
performance. We also discussed how various types of file and compression formats
help us to improve performance and, if not chosen wisely, the file format or compres-
sion could drag down the data processing performance. We also discussed gaining
higher query execution performance by modifying the query in such as way that its
processing is expedited. As most of these topics require a great deal of background
information, having them here in this topic as a reference will definitely help you to
understand them and use them to improve your Impala cluster performance.
The next chapter is all about troubleshooting Impala when experiencing problems. We
will extend our knowledge by learning how to find the root cause of various problems
in the Impala cluster and resolve them quickly.
Search WWH ::




Custom Search