Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Server was designed to scale up. It was also not designed from the ground

up with data warehousing in mind. PDW, however, is designed in this way.

It is a workload-specific appliance focused on the data warehouse.

The fact that PDW is a scale-out technology is incredibly important. It

places Microsoft into the same bracket as other vendors of MPP distributed

databases, such as Teradata, Netezza, Oracle, SAP HANA, and Pivotal, with

technology that has the ability to scale to the demands of big data projects.

The only relational database technology that has any presence in the world

of big data involves scale-out MPP databases. Each and every one of them

purports to have integration with Hadoop in some form or other. PDW is no

exception.

MPP databases offer some compelling benefits for the data warehouse and

for big data. The primary benefit is the ability to scale across servers

enabling a divide and conquer philosophy to data processing. By leveraging

a number of servers, PDW can address many more CPU cores than would

ever be possible in an SMP configuration.

In its biggest configuration, PDW supports 56 data processing servers

(known as compute nodes) comprising the following resources:

• 896 physical CPU cores

• 14TB of memory

• 6PB+ of storage capacity

What is even more impressive is that PDW forces you to use all these

resources. In other words, it forces parallelism into your queries. This is

fantastic for data warehousing.

Imagine having an option in SQL Server that gave you an option to run

with a minimum degree of parallelism or MINDOP(448). You can't? No, of

course you can't, because there is no such option in SQL Server. With PDW,

you have that by default.

Search WWH ::

Custom Search

Home