Microsoft's Approach to Big Data - Microsoft Big Data Solutions

Database Reference

In-Depth Information

NOTE

HCatalog is defined in Chapter 1.

Stinger Phase 3

Stinger phase 3 is underway, but will see Hadoop introduce Apache Tez,

thus moving away from batch to a more interactive query/response engine.

Vectorized queries (batch mode to SQL Server Query Processor aficionados)

and an in-memory cache are all in the pipeline. However, it is still the early

days for this phase of the Stinger initiative.

Cloudera and Impala

Cloudera chose a different direction when defining their SQL in Hadoop

strategy. Clearly, they saw the limitations of MapReduce and chose to

implement their own engine: Impala.

Cloudera took a different approach to Hortonworks when they built Impala.

In effect, they chose to sidestep the whole issue of Hadoop's legacy with

MapReduce and started over. Cloudera created three new daemons that

drive Impala:

• Impala Daemon

• Impala Statestore

• Impala Catalog Service

Impala Daemon

The Impala daemon is the core component, and it runs on every node of

the Hadoop cluster. The process is called impalad, and it operates in a

decentralized, multimaster pattern; that is, any node can be the controlling

“brain” for a given query. As the coordinating node is decided for each

query, a common single point of failure and bottleneck for a number of

massively parallel-processing (MPP) systems is elegantly removed from the

architecture. Note, however, that the Impala daemon you connect to when

submitting your query will be the one that will take on the responsibility

of acting as the coordinator. This could be load balanced by the calling

application. However, it is not automatically load balanced.

Search WWH ::

Custom Search

Home