Database Reference
In-Depth Information
NOTE
HCatalog is defined in Chapter 1.
Stinger Phase 3
Stinger phase 3 is underway, but will see Hadoop introduce Apache Tez,
thus moving away from batch to a more interactive query/response engine.
Vectorized queries (batch mode to SQL Server Query Processor aficionados)
and an in-memory cache are all in the pipeline. However, it is still the early
days for this phase of the Stinger initiative.
Cloudera and Impala
Cloudera chose a different direction when defining their SQL in Hadoop
strategy. Clearly, they saw the limitations of MapReduce and chose to
implement their own engine: Impala.
Cloudera took a different approach to Hortonworks when they built Impala.
In effect, they chose to sidestep the whole issue of Hadoop's legacy with
MapReduce and started over. Cloudera created three new daemons that
drive Impala:
• Impala Daemon
• Impala Statestore
• Impala Catalog Service
Impala Daemon
The Impala daemon is the core component, and it runs on every node of
the Hadoop cluster. The process is called impalad, and it operates in a
decentralized, multimaster pattern; that is, any node can be the controlling
“brain” for a given query. As the coordinating node is decided for each
query, a common single point of failure and bottleneck for a number of
massively parallel-processing (MPP) systems is elegantly removed from the
architecture. Note, however, that the Impala daemon you connect to when
submitting your query will be the one that will take on the responsibility
of acting as the coordinator. This could be load balanced by the calling
application. However, it is not automatically load balanced.
Search WWH ::




Custom Search