Database Reference
In-Depth Information
Chapter 1. Getting Started with Impala
This chapter covers the information on Impala, its core components, and its inner
workings in detail. We will cover Impala architecture including Impala daemon,
statestore, and execution model, and how they interact together along with other com-
ponents. Impala metadata and metastore are also discussed here, to understand how
Impala maintains its information. Finally, we will study various ways to interface Im-
pala.
The objective of this chapter is to provide enough information for you to kick-start Im-
pala on a single node experimental or multimode production cluster. This chapter cov-
ers the Impala essentials within the following broad categories:
• System requirement
• Installation
• Configuration
• Upgradation
• Security
• Impala architecture and execution
Impala is for a new breed of data wranglers who want to process the data at
lightening-fast speed using traditional SQL knowledge. Impala provides data analysts
or scientists a way to access data, which is stored on Hadoop at lightening speed by
directly using SQL or other Business Intelligence tools. Impala uses the Hadoop data
processing layer, also called HDFS, to process the data so there is no need to migrate
data from Hadoop to any other middleware, specialized system, or data warehouse.
Impala provides data wranglers a Massively Parallel Processing ( MPP ) query en-
gine, which runs natively on Hadoop.
Native on Hadoop means the engine runs on Hadoop and uses the Hadoop core
component, HDFS, along with other additional components, such as Hive and HBase.
To process data, Impala has its own execution component, which runs on each
DataNode where the data is stored in blocks. There is a list of third-party applications
that can directly process data stored on Hadoop through Impala. The biggest advant-
age of Impala is that data transformation or data movement is not required for data
stored on Hadoop. No data movement means all the processing is happening where
the data resides in the cluster. In other distributed systems, data is transferred over
the network before it is processed; however, with Impala the processing happens at
Search WWH ::




Custom Search