Database Reference
In-Depth Information
automobiles.txt /user/cloudera/automobiles/
automobiles.txt
[cloudera@localhost ~]$ hdfs dfs -moveFromLocal
motorcycles.txt /user/cloudera/motorcycles/
motorcycles.txt
[cloudera@localhost ~]$ hdfs dfs -ls /user/
cloudera/motorcycles/
Found 1 items
-rw-r--r-- 3 cloudera cloudera 932
2013-10-15 19:19 /user/cloudera/motorcycles/
motorcycles.txt
[cloudera@localhost ~]$ hdfs dfs -ls /user/
cloudera/automobiles/
Found 1 items
-rw-r--r-- 3 cloudera cloudera 985
2013-10-15 19:17 /user/cloudera/automobiles/
automobiles.txt
Now, we will load the preceding data into two separate tables in two different steps,
to learn various ways of loading data. The tables we are using here are external
tables instead of internal. For automobile data, I will load them directly from a script
into the
automobiles
table; and then I will load motorcycle data in the
motor-
cycles
table inside the Impala shell. In the script, I will add another empty table,
automakers
. Later, we will join a list of automakers from both tables. All of this pro-
cessing will be done in a database named
autos
.
Loading data into the Impala table from HDFS
Here is the SQL script to create a database
autos
first, create the
automobiles
table, and then load the whole dataset from HDFS. I am also creating an empty table
automakers
in the
autos_script.sql
script as follows:
USE default;
DROP DATABASE IF EXISTS autos;
CREATE DATABASE autos;
USE autos;