Database Reference
In-Depth Information
memory is not sufficient, Impala will not be able to process the query and the query
will be canceled.
For best performance with Impala, it is suggested to have DataNodes with multiple
storage disks because disk I/O speed is often considered the bottleneck for Impala
performance. The total amount of physical storage requirement is based on the
source data, which you would want to process with Impala.
As Impala uses the SSE4.2 CPU instructions set, which is mostly found in the latest
processors, the latest processors are often suggested for better performance with
Impala.
Networking requirements
Impala daemons running in DataNodes can process data stored in local nodes as
well as in remote nodes. To achieve the highest performance, it is advised that Im-
pala attempts to complete data processing on the local data instead of remote data
using a network connection. To achieve local data processing, Impala matches the
hostname provided to each Impala daemon with the IP address of each DataNode
by resolving the hostname flag to an IP address. For Impala to work with the local
data stored in a DataNode, you must use a single IP interface for the DataNode and
an Impala daemon on each machine. Since there is a single IP address, make sure
that the Impala daemon hostname flags resolve the IP address of the DataNode.
User account requirements
When Impala is installed, a user name impala and group name impala is created,
and Impala uses this username and group name during its life after installation. You
must ensure that no one changes the impala group and user settings, and also no
other application or system activity obstructs the functionality of the impala user
and group. To achieve the highest performance, Impala uses direct reads and, be-
cause a root user cannot do direct reads, Impala is not executed as root. To achieve
full performance with Impala, the user must make sure that Impala is not running as
a root user.
Search WWH ::




Custom Search