Database Reference
In-Depth Information
administration and configuration of Hive is beyond the scope for this
chapter, the discussion here does include basic commands for creating
tables and working with data in Hive.
Understanding Hive's Purpose and Role
Hadoop was developed to handle big data. It does an admirable job of this;
but in creating a new platform to solve this problem, it introduced a new
challenge: people had to learn a new and different way to work with their
data. Instead of using Structured Query Language (SQL) to retrieve and
transform data, they had to use Java and MapReduce. Not only did this
mean that data professionals had to learn a new skillset, but also that the
SQL query tools that IT workers and business users traditionally used to
access data didn't work against Hadoop.
Hive was created to address these needs and make it easier for people and
tools to work with Hadoop data. It does that by acting as an interpreter for
Hadoop; you give Hive instructions in Hive Query Language (HQL), which
is a language that looks very much like SQL, and Hive translates that HQL
into MapReduce jobs. This opens up Hadoop data to tools and users that
understand SQL.
In addition to acting as a translator, Hive also answers another common
challenge with data in Hadoop. Files stored in Hadoop do not have to share
a common data format. They can be text files delimited by commas, control
characters, oranyofawidevariety ofcharacters. It'snotevennecessary that
they be delimited text files. They can be files that use binary format, XML,
or any of a combination of different formats. Hive enables you to deliver the
data to users in a way that adheres to a defined schema or format.
Hive addresses these issues by providing a layer on top of Hadoop data that
resembles a traditional relational database. In particular, Hive is designed
to support the common operations for data warehousing scenarios.
Search WWH ::




Custom Search