Designing a Data Warehouse - Pro SQL Server 2012 BI Solutions

Database Reference

In-Depth Information

The Fact Table

After you have defined your measures, you can begin designing your fact table. This fact table may look

remarkably like the OLTP table where the measures were found, or it may combine data from multiple tables,

which in turn denormalizes the data.

An example of this process would be collapsing the SalesLineItems table and the Sales table into one

fact table. The SalesLineItems table defines a many-to-many relationship between sales and titles, but that

relationship will still exist even if you collapse the SalesLineItems into the Sales table. When you do this, you end

up with a single table with redundant values, as displayed in Figure 4-4 . This denormalized design is typical in

data warehouses. The SalesId will be repeated multiple times for each line item, but it is considered a small price

to pay for simplification, and this pattern is representative of all fact tables.

Another common feature of the fact table is the simplification of datatypes. The actual sales price may have

originally been recorded as a SQL Server money datatype, yet the fact table would represent this data as decimal

(18, 4), as you see here (i.e., a total of 18 numbers with four of those numbers after the decimal point). The reason

for this change is that SQL Server's money type is a custom datatype associated with Microsoft's SQL Server.

Money is not an industry-standard datatype, but the decimal datatype is indeed an industry standard. The money

datatype can be accurately represented by this decimal datatype; thus, it is logical to do so. Applications that use

the data warehouse are more likely to work correctly using a standard datatype than a custom datatype.

In addition to the measures, the fact table contains IDs, or keys, that connect to dimension tables. These

columns are referred to as dimensional keys . The dimensional keys may be listed at the front of the column list or

placed at the end—it does not matter. We have chosen to represent them at the beginning of the column list, but

it is simply a matter of preference.

Another preference is how you name a fact table. Some developers might choose the name Sales. Others

would prefer the name SalesFacts. However, we chose the name FactSales for our fact table. Placing its

designation as a fact table at the beginning of the table name will help organize our tables alphabetically. Our

convention does not have much significance other than that it just makes it easier to group tables together in

some applications. For instance, SQL Server's Management Studio will sort tables alphabetically in its Object

Explorer window.

Dimensions

Once the fact table is defined, it is time to turn your attention to describing your measures with dimensions. At

a minimum, each dimension table should contain a dimensional key column and a dimensional name column.

The dimensional key column will typically be something such as ProductId or CustomerId. The name column

provides a human-friendly description of that particular ID. Along with the name, this ID column may need

additional descriptive data to allow for better organization and a clearer understanding, for example, the name of

its category.

In Figure 4-5 , we have created a Titles dimension table with four dimensional attributes (TitleId, TitleName,

TitleStandardRetailPrice and TitleCategory). The TitleId is the dimensional key, but all four are dimensional

attributes.

In Figure 4-5 , note that the money datatype is now translated into a decimal for the same reasons we

described with regard to the SalesActualPrice measure. We have also collapsed the data from two tables into

one table by taking the category data from the Category table and moving into the TitleCategory column into

the DimTitles table. In addition, the original Category table used a varchar datatype, but the new dimensional

table uses an nvarchar, or Unicode variable character datatype. Unicode has now become the standard for most

modern applications, and although it takes up twice the size in bytes to store this datatype, it is more consistent

with the datatypes in the other columns within the database. Simplicity is often more important to the data

warehouse design than absolute efficiency, and giving all columns consistent datatypes as well as consistent sizes

is, well, simpler.

Pro SQL Server 2012 BI Solutions

Search WWH ::

Custom Search

Home