Getting Started with Impala - Learning Cloudera Impala

Database Reference

In-Depth Information

Impala core components

In this section we will first learn about various important components of Impala and

then discuss the intricate details on Impala inner workings. Here, we will discuss the

following important components:

• Impala daemon

• Impala statestore

• Impala metadata and metastore

Putting together the above components with Hadoop and an application or command

line interface, we can conceptualize them as seen in the following figure:

Let's starts discussing the core Impala components in detail now.

Impala daemon

At the core of Impala, there exists the Impala daemon, which runs on each DataNode

where Impala is installed. The Impala daemon is represented by an actual process

named impalad . This Impala daemon process impalad is responsible for processing

the queries, which are submitted through Impala shell, API, and other third-party ap-

plications connected through ODBC/JDBC connectors or Hue.

A query can be submitted to any impalad running on any node, and that particular

node serves as a "coordinator node" for that query. Multiple queries are served by im-

palad running on other nodes as well. After accepting the query, impalad reads and

writes to data files and parallelizes the queries by distributing the work to other Im-

pala nodes in the Impala cluster. When queries are processing on various impalad

instances, all impalad instances return the result to the central coordinator node.

Search WWH ::

Custom Search

Home