Database Reference
In-Depth Information
Hive provides several forms of connectivity to Hadoop data through Thrift.
Thrift is a software framework that supports network service
communication, including support for JDBC and ODBC connectivity.
Because ODBC is broadly supported by query access tools, it makes it much
easier for business users to access the data in Hadoop using their favorite
analysis tools. Excel is one of the common tools used by end users for
working with data, and it supports ODBC. (Using Excel with Hadoop is
discussed further in Chapter 11, “Visualizing Big Data with Microsoft BI.”)
In addition to providing ODBC data access, Hive also acts as a translator for
the SQL. As mentioned previously, many users and developers are familiar
withwritingSQLstatementstoqueryandtransformdata.Hivecantakethat
SQLandtranslateitintoMapReducejobs.So,ratherthanthebusinessusers
having to learn Java and MapReduce, or learn a new tool for querying data,
they can leverage their existing knowledge and skills.
Hive manages this SQL translation by providing Hive Query Language
(HQL). HQL provides support for common SQL language operations like
SELECT for retrieving information and INSERT INTO to load data.
Although HQL is not ANSI SQL compliant, it implements enough of the
standard to be familiar to users who have experience working with RDBMS
systems.
Differentiating Hive from Traditional RDBMS Systems
This chapter has discussed several of the ways that Hive emulates a
relational database. It's also covered some of the ways in which it differs,
including the data types and the storage of the data. Those topics are worth
covering in a bit more depth because they do have significant impact on how
Hive functions and what you should expect from it.
In a relational database like SQL Server, the database engine manages the
data storage. That means when you insert data into a table in a relational
database, the server takes that data, converts it into whatever format it
chooses, and stores it in data structures that it manages and controls. At
that point, the server becomes the gatekeeper of the data. To access the data
again, you must request it from the relational database so that the server
can retrieve it from the internal storage and return it to you. Other systems
cannot access or change the data directly without going through the server.
Search WWH ::




Custom Search