Microsoft's Approach to Big Data - Microsoft Big Data Solutions

Database Reference

In-Depth Information

NOTE

Refer back to Chapter 1 for definitions of Hadoop projects YARN and

Tez.

Phase 2 included the following enhancements:

• Performance: Queries got faster with Stinger phase 2 thanks to a

number of changes. A new logical optimizer was introduced called the

Correlation Optimizer. Its job is to merge multiple correlated

MapReduce jobs into a single job to reduce the movement of data.

ORDER BY was made a parallel operation. Furthermore, predicate

pushdown was implemented to allow ORCFile to skip over rows, much

like segment skipping in SQL Server. Optimizations were also added for

COUNT (DISTINCT) , with the hive.map.groupby.sorted

configuration property.

• SQL compatibility: Two significant data types were introduced:

VARCHAR and DATE . GROUP BY support was enhanced to enable

support for struct and union types. Lateral views were also extended to

support an “outer” join behavior, and truncate was extended to support

truncation of columns. New user-defined functions (UDFs) were added

to work over the Binary data type. Finally partition switching entered

the product courtesy of ALTER TABLE..EXCHANGE PARTITION .

NOTE

SQL Server does not support lateral views. That's because SQL Server

doesn't support a data type for arrays and functions to interact with this

type. To learn about lateral views, head over to

LanguageManual+LateralView .

• End of HCatalog project: With Hive 0.12, HCatalog ceased to exist as its

own project and was merged into Hive.

Search WWH ::

Custom Search

Home