Impala Administration and Performance Improvements - Learning Cloudera Impala

Database Reference

In-Depth Information

Improving performance

In this section, we will learn a few helpful pointers to improve performance by modi-

fying Impala daemon execution and the underlying platform where Impala performs

user actions.

Enabling block location tracking

When queries are executed in Impala, data is read from HDFS that is distributed

across multiple DataNodes in the form of data blocks. If Impala knows more informa-

tion about these data blocks on HDFS, the data can be read faster and queries can

achieve faster execution. To enable block location tracking for Impala, you just need

to perform the following steps:

1. Modify the HDFS configuration hdfs-site.xml as follows:

<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>

</property>

2. Copy hdfs-site.xml and core-site.xml from the Hadoop cluster to

each Impala node into the Impala configuration folder, /etc/impala/conf .

3. Restart all DataNodes in your cluster.

Enabling native checksumming

Computing data checksum for very large amounts of data could add a significant

amount of time. So having a native library to perform checksum helps improve the

performance. You can use the following information to enable native checksumming

in Impala:

• If Impala is installed using Cloudera Manager, native checksumming is con-

figured automatically and no action is needed.

• To enable native checksumming on your self-installed Impala, you must build

and install the Hadoop native library, libhadoop.so . If this library is not

Search WWH ::

Custom Search

Home