Database Reference
In-Depth Information
Improving performance
In this section, we will learn a few helpful pointers to improve performance by modi-
fying Impala daemon execution and the underlying platform where Impala performs
user actions.
Enabling block location tracking
When queries are executed in Impala, data is read from HDFS that is distributed
across multiple DataNodes in the form of data blocks. If Impala knows more informa-
tion about these data blocks on HDFS, the data can be read faster and queries can
achieve faster execution. To enable block location tracking for Impala, you just need
to perform the following steps:
1. Modify the HDFS configuration hdfs-site.xml as follows:
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
2. Copy hdfs-site.xml and core-site.xml from the Hadoop cluster to
each Impala node into the Impala configuration folder, /etc/impala/conf .
3. Restart all DataNodes in your cluster.
Enabling native checksumming
Computing data checksum for very large amounts of data could add a significant
amount of time. So having a native library to perform checksum helps improve the
performance. You can use the following information to enable native checksumming
in Impala:
• If Impala is installed using Cloudera Manager, native checksumming is con-
figured automatically and no action is needed.
• To enable native checksumming on your self-installed Impala, you must build
and install the Hadoop native library, libhadoop.so . If this library is not
Search WWH ::




Custom Search