Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

WARNING

Directly Accessing PDW's SQL Servers and the Shell Database PDW

does not let you directly access the SQL Servers either on the control

node or the compute nodes. This is to ensure that no inadvertent

changes are made that could damage PDW and potentially void the

warranty. Consequently, whilst it is an important component to

understand, the shell database is not directly accessible by end users.

It's created purely for PDW to use.

Each compute node also has a database created. These databases are where

all the user data is stored. These databases have a rather interesting

configuration. They consist of 10 separate filegroups that are key to PDW's

parallelism. Each database on each compute node has all 10 filegroups. I've

detailed them in the all in the matrix shown in Figure 10.3 .

Figure 10.3 The bucket matrix created by PDW for holding user data

I have listed out all the filegroups on the x -axis of the matrix and shown

compute nodes on the y -axis. This symbolizes something important. Every

compute node has its own database each with their own set of filegroups.

The matrix represents the total number of buckets available to PDW for

depositingdata.ThisreallyisthekeytohowPDWworks.Themorecompute

nodes you have, the more buckets.

The filegroups on the x-axis start with some unusually named ones DIST_A

- DIST_H. These are called distributions and are designed for a certain type

of table called a distributed table . The next filegroup is called replicated .

You will notice that there is only one of these. That is significant. This

filegroup holds replicated tables .

Search WWH ::

Custom Search

Home