Impala Walkthrough with an Example - Learning Cloudera Impala

Database Reference

In-Depth Information

Chapter 4. Impala Walkthrough with an

Example

In this chapter, we will go over a use case to see Impala concepts in action. This

way you can experience a real-world scenario using Impala, and understand how and

where to use Impala statements in real-world applications. In this chapter, I will be us-

ing a scenario as described in the following sections.

Creating an example scenario

We are going to deal with information related to automobiles. We have two data files

that contain information about automobiles and motorcycles in two separate text files.

The following conceptual image shows that within the Autos database, there are two

tables named Motorcycles and Automobiles .

So far, it is imprinted on your mind that Impala is running on DataNode, and the files in

our project are stored on HDFS. First we will load these files from HDFS to Impala and

then we will use SQL statements to process this information through multiple queries.

Example

dataset

one

-

automobiles

(automobiles.txt)

Let's take a look at this example dataset, which has a list of automobile names and

their properties as defined in the schema. The following is the first text file, which has

automobile-specific data:

• File : automobiles.txt

• Schema : make , model , year , fuel-type , numOfDoors , design , type ,

cylinders , horsepower , city_hwy_mpg , price

Here is the data in the automobiles.txt file:

Search WWH ::

Custom Search

Home