Getting Started with Impala - Learning Cloudera Impala

Database Reference

In-Depth Information

Depending on your requirement, queries can be submitted to a dedicated impalad or

in a load balanced manner to another impalad in your cluster.

Impala statestore

Impala has another important component called Impala statestore, which is respons-

ible for checking the health of each impalad, and then relaying each impala daemon

health to other daemons frequently. Impala statestore is a single running process

and can run on the same node where the Impala server or any other node within the

cluster is running. The name of the Impala statestore daemon process is statestored .

Every Impala daemon process interacts with the Impala statestore process provid-

ing its latest health status and this information is relayed within the cluster to each

and every Impala daemon so they can make correct decisions before distributing

the queries to a specific impalad . In the event of a node failure due to any reason,

statestored updates all other nodes about this failure, and once such a notification

is available to other impalad no other Impala daemon assigns any further queries to

the affected node.

One important thing to note here is that even when the Impala statestore component

provides a critical update on the node in trouble, the process itself is not critical to the

Impala execution. In an event where the Impala statestore becomes unavailable, the

rest of the node continues working as usual. When statestore is offline, the cluster

becomes less robust, and when statestore is back online it restarts communicating

with each node and resumes its natural process.

Impala metadata and metastore

Another important component of Impala is its metadata and metastore. Impala uses

traditional MySQL or PostgreSQL databases to store table definitions. While other

databases can also be used to configure the Hive metastore, either MySQL or Post-

greSQL is recommended. The important details, such as table and column informa-

tion and table definitions are stored in a centralized database known as a metastore.

Apache Hive also shares the same databases for its metastore, because of which

Impala can access the table created or loaded by Hive if all the table columns use

the supported data types, data format, and data compression types.

Besides that, Impala also maintains information about the data files stored on HDFS.

Impala tracks information about file metadata, that is, the physical location of the

Search WWH ::

Custom Search

Home