Database Reference
In-Depth Information
Depending on your requirement, queries can be submitted to a dedicated impalad or
in a load balanced manner to another impalad in your cluster.
Impala statestore
Impala has another important component called Impala statestore, which is respons-
ible for checking the health of each impalad, and then relaying each impala daemon
health to other daemons frequently. Impala statestore is a single running process
and can run on the same node where the Impala server or any other node within the
cluster is running. The name of the Impala statestore daemon process is statestored .
Every Impala daemon process interacts with the Impala statestore process provid-
ing its latest health status and this information is relayed within the cluster to each
and every Impala daemon so they can make correct decisions before distributing
the queries to a specific impalad . In the event of a node failure due to any reason,
statestored updates all other nodes about this failure, and once such a notification
is available to other impalad no other Impala daemon assigns any further queries to
the affected node.
One important thing to note here is that even when the Impala statestore component
provides a critical update on the node in trouble, the process itself is not critical to the
Impala execution. In an event where the Impala statestore becomes unavailable, the
rest of the node continues working as usual. When statestore is offline, the cluster
becomes less robust, and when statestore is back online it restarts communicating
with each node and resumes its natural process.
Impala metadata and metastore
Another important component of Impala is its metadata and metastore. Impala uses
traditional MySQL or PostgreSQL databases to store table definitions. While other
databases can also be used to configure the Hive metastore, either MySQL or Post-
greSQL is recommended. The important details, such as table and column informa-
tion and table definitions are stored in a centralized database known as a metastore.
Apache Hive also shares the same databases for its metastore, because of which
Impala can access the table created or loaded by Hive if all the table columns use
the supported data types, data format, and data compression types.
Besides that, Impala also maintains information about the data files stored on HDFS.
Impala tracks information about file metadata, that is, the physical location of the
Search WWH ::




Custom Search