Database Reference
In-Depth Information
contested or more important to the future of adoption of a distribution than
the next generation of SQL on Hadoop.
SQL on Hadoop Today
To recap what you learned in Chapter 1, “Industry Needs and Solutions”:
SQL on Hadoop came into being via the Hive project. Hive abstracts away
the complexity of MapReduce by providing a SQL-like language known as
Hive Query Language (HQL). Notice that it does not suddenly mean that
Hadoop observes all the ACID (atomicity, consistency, isolation, durability)
rules of a transaction. It is more that Hadoop offers through Hive a querying
syntax that is familiar to end users. However, you want to note that Hive
works only on data that resides in Hadoop.
The challenge for Hive has always been that dependency on MapReduce.
Owing to the tight coupling between the execution engine of MapReduce
and the scheduling, there was no choice but to build on top of MR. However,
Hadoop 2.0 and project YARN changed all that. By separating scheduling
into its own project and decoupling it from execution, new possibilities have
surfaced for the evolution of Hive.
Hortonworks and Stinger
Hortonworks has focused all its energy on Stinger. Stinger is not a Hadoop
project as such; instead, it is an initiative to dramatically improve the
performance and completeness of Hive. The goal is to speed up Hive by
100x. No mean feat. What is interesting about Stinger is that all the coding
effort goes directly into the Hadoop projects. That way everyone benefits
from the changes made. This completely aligns with Hortonworks's
commitment and charter to Hadoop.
So what is Stinger? It consists of three phases. The first two phases have
already been delivered.
Stinger Phase 1
Phase 1 was primarily aimed at optimizing Hive within its current
architecture. Hence it was delivered in Hive 0.11 in May 2013, forming part
of Hortonworks Data Platform (HDP) 1.3 release. Phase 1 delivered three
changes of notable significance:
Search WWH ::




Custom Search