Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

administration and configuration of Hive is beyond the scope for this

chapter, the discussion here does include basic commands for creating

tables and working with data in Hive.

Understanding Hive's Purpose and Role

Hadoop was developed to handle big data. It does an admirable job of this;

but in creating a new platform to solve this problem, it introduced a new

challenge: people had to learn a new and different way to work with their

data. Instead of using Structured Query Language (SQL) to retrieve and

transform data, they had to use Java and MapReduce. Not only did this

mean that data professionals had to learn a new skillset, but also that the

SQL query tools that IT workers and business users traditionally used to

access data didn't work against Hadoop.

Hive was created to address these needs and make it easier for people and

tools to work with Hadoop data. It does that by acting as an interpreter for

Hadoop; you give Hive instructions in Hive Query Language (HQL), which

is a language that looks very much like SQL, and Hive translates that HQL

into MapReduce jobs. This opens up Hadoop data to tools and users that

understand SQL.

In addition to acting as a translator, Hive also answers another common

challenge with data in Hadoop. Files stored in Hadoop do not have to share

a common data format. They can be text files delimited by commas, control

characters, oranyofawidevariety ofcharacters. It'snotevennecessary that

they be delimited text files. They can be files that use binary format, XML,

or any of a combination of different formats. Hive enables you to deliver the

data to users in a way that adheres to a defined schema or format.

Hive addresses these issues by providing a layer on top of Hadoop data that

resembles a traditional relational database. In particular, Hive is designed

to support the common operations for data warehousing scenarios.

Search WWH ::

Custom Search

Home