Database Reference
In-Depth Information
Chapter 6
Adding Structure with Hive
What You Will Learn in This Chapter
• Learning How Hive Provides Value in a Hadoop Environment
• Comparing Hive to a Relational Database
• Working With Data in Hive
• Understanding Advanced Options in Hive
This chapter discusses how you can use Hive with Hadoop to get more value
out of your big data initiatives. Hive is a component of all major Hadoop
distributions, and it is used extensively to provide SQL-like functionality
from a Hadoop installation. For example, Hive is often used to enable
common data warehouse scenarios on top of data stored in Hadoop. An
example of this would be retrieving a summary of sales by store, and by
department. Using MapReduce to prepare and produce these results would
take multiple lines of Java code. By using Hive, you can write a familiar SQL
query to get the same results:
SELECT Store, Department, SUM(SalesAmount)
FROM StoreSales
GROUP BY Store, Department
If you are familiar with SQL Server, or other relational databases, portions of
Hive will seem very familiar. Other aspects of Hive, however, may feel very
different or restrictive compared to a relational database. It's important to
remember that Hive attempts to bridge some of the gap between Hadoop
Distributed File System (HDFS) data store and the relational world, while
providing some of the benefits of both technologies. By keeping that
perspective, you'll find it easier to understand how and why Hive functions as
it does.
Hive is not a full relational database, and limitations apply to the relational
database management system (RDBMS) functionality it supports. The
differences that are most likely to impact someone coming from a relational
perspective are covered in this chapter. Although complete coverage of the
Search WWH ::




Custom Search