Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Don't worry if all this talk of different table types is a bit confusing; we are

going to talk about distributed and replicated tables later in this chapter.

Suffice it to say that PDW supports two types of tables: distributed and

replicated. When creating a table, we decide whether we want to distribute

the data across the appliance (in which case we use a distributed table) or

if we want to replicate the table. Typically, large facts are distributed, and

dimensions are replicated. This is not always true, but it is a reasonable

starting point.

For completeness, we need to recognize the primary filegroup. The primary

filegroup only holds data system table metadata for your database objects

just like a regular SQL Server database. The only difference is that we do not

let user data go into the primary filegroup. Most of the time, we ignore that

it is even there.

You should now see that the CREATE DATABASE statement is logical;

that is, it is used to create multiple physical databases to support the one

database you have specified. To help facilitate this shift from logical to

physical, PDW implements a layer of abstraction. An example of that can

be seen with the database name. Only the database built on the control

node would be given the name you have specified. The databases on the

compute nodes have a different name. It follows the convention of DB_

followed by 32 alphanumeric characters. This abstracted name is used on

every compute node. PDW exposes this mapping via a PDW-specific catalog

mapping view called sys.pdw_database_mappings. Here is some sample

code and the results (See Figure 10.4 ) :

SELECT d.name

, dm.physical_name

FROM sys.databases d

JOIN sys.pdw_database_mappings dm ON d.database_id =

dm.database_id

WHERE d.name = 'AdventureWorksPDW2012'

Figure 10.4 Logical and physical database names in PDW

Search WWH ::

Custom Search

Home