Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Chapter 6

Adding Structure with Hive

What You Will Learn in This Chapter

• Learning How Hive Provides Value in a Hadoop Environment

• Comparing Hive to a Relational Database

• Working With Data in Hive

• Understanding Advanced Options in Hive

This chapter discusses how you can use Hive with Hadoop to get more value

out of your big data initiatives. Hive is a component of all major Hadoop

distributions, and it is used extensively to provide SQL-like functionality

from a Hadoop installation. For example, Hive is often used to enable

common data warehouse scenarios on top of data stored in Hadoop. An

example of this would be retrieving a summary of sales by store, and by

department. Using MapReduce to prepare and produce these results would

take multiple lines of Java code. By using Hive, you can write a familiar SQL

query to get the same results:

SELECT Store, Department, SUM(SalesAmount)

FROM StoreSales

GROUP BY Store, Department

If you are familiar with SQL Server, or other relational databases, portions of

Hive will seem very familiar. Other aspects of Hive, however, may feel very

different or restrictive compared to a relational database. It's important to

remember that Hive attempts to bridge some of the gap between Hadoop

Distributed File System (HDFS) data store and the relational world, while

providing some of the benefits of both technologies. By keeping that

perspective, you'll find it easier to understand how and why Hive functions as

it does.

Hive is not a full relational database, and limitations apply to the relational

database management system (RDBMS) functionality it supports. The

differences that are most likely to impact someone coming from a relational

perspective are covered in this chapter. Although complete coverage of the

Search WWH ::

Custom Search

Home