Analytics with Hadoop - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

confusion about which table a column belongs to. The SELECT statement will only output a data row if a matching row

is found on the “myear,” “manufacturer,” and “model” columns of both tables:

SELECT

rd1.*

FROM

fuel.consumption rd1,

fuel.consumption3 rd2

WHERE

rd1.myear = rd2.myear AND

rd1. manufacturer = rd2. manufacturer AND

rd1. model = rd2. model

The INSERT Statement

The INSERT statement allows you to insert a single row into an Impala table, or you can combine INSERT with a SELECT

statement to insert multiple rows. These commands are designed to move large data volumes.

In a first example of this command, the INSERT statement inserts a single row into the fuel database table called

consumption2. The number and type of column values specified in parentheses must match the table definition.

Because the data values are strings, they are shown within single quotes:

INSERT INTO fuel.consumption2 VALUES ('1995','ACURA','INTEGRA','SUBCOMPACT','1.8','4','A4',

'X','10.2','7','28','40','1760','202')

Although this single INSERT statement moves a single row of data into the table fuel.consumption2, a second

example offers better performance by using a SELECT statement to populate the table fuel.consumption2 via a bulk

insert:

INSERT INTO TABLE fuel.consumption2 SELECT * FROM fuel.consumption3

This second version is a simple example; no extra filters or table joins have been used. But the size of the subtable

could be large; with a simple statement, you could copy a large volume of data.

Note: When executing an INSERT statement, you may receive an error message similar to the following:

AnalysisException: Unable to INSERT into target table (fuel.consumption2) because Impala does not

have WRITE access to at least one HDFS path: hdfs://hc1nn/user/hive/warehouse/fuel.db/consumption2

This error occurs if the referenced database and/or table name on HDFS is not owned by the HDFS impala user.

To fix this, you change the ownership with a statement like the following:

[hadoop@hc1r1m1 ~]$ hdfs dfs -chown -R impala:supergroup /user/hive/warehouse/fuel.db/

When run as the hadoop user, this statement recursively changes ownership of the HDFS fuel.db directory to the

impala user and changes the group to a supergroup. Then, you drop the table and re-create it to update the Impala

metadata for the table. After these steps, your INSERT statement should work.

This has been a very short introduction to Cloudera Impala SQL. For full details, check the Impala web site at

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala . Additionally, try to create your

own SQL statements by joining these simple building blocks. For instance, you can create SELECT statements with

WHERE clauses. And you can add table joins to your SQL statements and consider adding subqueries.

Search WWH ::

Custom Search

Home