Database and Data Management - Field Guide to Hadoop

Database Reference

In-Depth Information

▪ Hive does not support non-equality join conditions.

▪ Update and delete statements are not supported.

▪ Transactions are not supported.

You may not need these, but if you run code generated by third-party solutions, they may

generate non-Hive compliant code.

Hive does not mandate read or written data be in the “Hive format”—there is no such thing.

This means your data can be accessed directly by Hive without any of the extract, transform,

and load (ETL) preprocessing typically required by traditional relational databases.

Tutorial Links

A couple of great resources are the official Hive tutorial and this video published by the folks

at HortonWorks.

Example Code

Say we have a comma-separated values (CSV) file containing movie reviews with informa-

tion about the reviewer, the movie, and the rating:

Kevin,Dune,10

Marshall,Dune,1

Kevin,Casablanca,5

Bob,Blazing Saddles,9

First, we need to define the schema for our data:

CREATE TABLE movie_reviews

( reviewer STRING, title STRING, rating INT)

ROW FORMAT DELIMITED

FILEDS TERMINATED BY '\,'

STORED AS TEXTFILE

Next, we need to load the data by pointing the table at our movie reviews file. Because Hive

doesn't require that data be stored in any specific format, loading a table consists simply of

pointing Hive at a file in HDFS:

Search WWH ::

Custom Search

Home