Database Reference
In-Depth Information
The Fact Table
After you have defined your measures, you can begin designing your fact table. This fact table may look
remarkably like the OLTP table where the measures were found, or it may combine data from multiple tables,
which in turn denormalizes the data.
An example of this process would be collapsing the SalesLineItems table and the Sales table into one
fact table. The SalesLineItems table defines a many-to-many relationship between sales and titles, but that
relationship will still exist even if you collapse the SalesLineItems into the Sales table. When you do this, you end
up with a single table with redundant values, as displayed in Figure 4-4 . This denormalized design is typical in
data warehouses. The SalesId will be repeated multiple times for each line item, but it is considered a small price
to pay for simplification, and this pattern is representative of all fact tables.
Another common feature of the fact table is the simplification of datatypes. The actual sales price may have
originally been recorded as a SQL Server money datatype, yet the fact table would represent this data as decimal
(18, 4), as you see here (i.e., a total of 18 numbers with four of those numbers after the decimal point). The reason
for this change is that SQL Server's money type is a custom datatype associated with Microsoft's SQL Server.
Money is not an industry-standard datatype, but the decimal datatype is indeed an industry standard. The money
datatype can be accurately represented by this decimal datatype; thus, it is logical to do so. Applications that use
the data warehouse are more likely to work correctly using a standard datatype than a custom datatype.
In addition to the measures, the fact table contains IDs, or keys, that connect to dimension tables. These
columns are referred to as dimensional keys . The dimensional keys may be listed at the front of the column list or
placed at the end—it does not matter. We have chosen to represent them at the beginning of the column list, but
it is simply a matter of preference.
Another preference is how you name a fact table. Some developers might choose the name Sales. Others
would prefer the name SalesFacts. However, we chose the name FactSales for our fact table. Placing its
designation as a fact table at the beginning of the table name will help organize our tables alphabetically. Our
convention does not have much significance other than that it just makes it easier to group tables together in
some applications. For instance, SQL Server's Management Studio will sort tables alphabetically in its Object
Explorer window.
Dimensions
Once the fact table is defined, it is time to turn your attention to describing your measures with dimensions. At
a minimum, each dimension table should contain a dimensional key column and a dimensional name column.
The dimensional key column will typically be something such as ProductId or CustomerId. The name column
provides a human-friendly description of that particular ID. Along with the name, this ID column may need
additional descriptive data to allow for better organization and a clearer understanding, for example, the name of
its category.
In Figure 4-5 , we have created a Titles dimension table with four dimensional attributes (TitleId, TitleName,
TitleStandardRetailPrice and TitleCategory). The TitleId is the dimensional key, but all four are dimensional
attributes.
In Figure 4-5 , note that the money datatype is now translated into a decimal for the same reasons we
described with regard to the SalesActualPrice measure. We have also collapsed the data from two tables into
one table by taking the category data from the Category table and moving into the TitleCategory column into
the DimTitles table. In addition, the original Category table used a varchar datatype, but the new dimensional
table uses an nvarchar, or Unicode variable character datatype. Unicode has now become the standard for most
modern applications, and although it takes up twice the size in bytes to store this datatype, it is more consistent
with the datatypes in the other columns within the database. Simplicity is often more important to the data
warehouse design than absolute efficiency, and giving all columns consistent datatypes as well as consistent sizes
is, well, simpler.
 
Search WWH ::




Custom Search