Information Technology Reference
In-Depth Information
2. Next, using the Hive CLI, create a Hive table to load the compressed file The
EXTERNAL statement lets the user create the table that does not store data in the
default HDFS location. It also makes sure that whenever the table is dropped, the
data remains intact and is not removed. All other operations such as read and write
work as usual. Moreover, note that the LOCATION statement takes as its parameter the
same path as we created in step 1, but it has to be fully qualified with the HDFS pro-
tocol scheme and the cluster name.
Hive> CREATE EXTERNAL TABLE calllist (SubscriberID STRING, StartingTime
STRING, EndingTime STRING, InitCell INT, InitSector INT, LastCell INT,
LastSector INT, CallDirection INT) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\;' LINES TERMINATED BY '\n' LOCATION 'hdfs://mycluster/user/
myuser/old_data/calllist';
3. From the Hive CLI, load the compressed data into the Hive table using the source
compressed file
Hive> LOAD DATA LOCAL INPATH '/mnt/myuser/calllist.gz' INTO TABLE calllist;
4. Create an HDFS directory using Hadoop command. The directory will later be
pointed to the new Hive table to store decompressed data in SequenceFile format.
$ hadoop fs -mkdir /user/myuser/old_data/calllist_seq
5. Create a Hive table that will store files in SequenceFile format. Note that there is a
STORED AS SEQUENCEFILE statement added to the Hive query.
Hive> CREATE EXTERNAL TABLE calllist_seq (SubscriberID STRING, StartingTime
STRING, EndingTime STRING, InitCell INT, InitSector INT, LastCell INT,
LastSector INT, CallDirection INT) STORED AS SEQUENCEFILE LOCATION 'hdfs://
mycluster/user/myuser/old_data/calllist_seq';
6. Load the data into the Hive table.
Hive> INSERT OVERWRITE TABLE calllist_seq SELECT * FROM calllist;
7. Clean up all that is not used anymore. The Hadoop remove command with the
-skipTrash switch will skip the Hadoop trash directory and irreversibly destroy
the data. This command should be executed with caution.
Hive> DROP TABLE calllist;
$ hadoop fs -rmr -skipTrash /user/myuser/old_data/calllist
 
Search WWH ::




Custom Search