Databases Reference
In-Depth Information
table, then they can be ignored for this step (they may be needed for other
fact tables).
Sales
Customer
Table
Marketing
Customer
Master
Accounting
Customer
Table
List of
Customers
with
Transactions
Do we
know this
customer
id?
Validate
Reference
Data
Build
Customer
Attributes
No
Check for
duplicates
Yes
Has this
customer
changed?
Yes
Build Changed
Customer Row
Changed
Customer
Dimension
Rows
New
Customer
Dimension
Rows
Customer
Errors
Figure 10-1 High-level Customer dimension data flow diagram
Check the existence of dimension reference data for each sales trans-
action: This means that the critical descriptive data such as the product,
customer, and sales date are valid. Transactions can flow through the
sales system with missing or invalid customer identifiers. There must be
a set of rules for how to handle each special case that is observed on the
transaction. It is also critical to ensure that there is a row in the dimension
table for each instance that is used for a transaction. This is known as
referential integrity .
Assign appropriate data warehouse identifiers: Once the critical ref-
erence data has been validated, the identifiers must be changed from
those used by the underlying source systems to those used in the data
warehouse environment, the surrogate keys.
Validate fact fields: Although the actual facts will be specific to each
transaction, there are often general guidelines that can be checked to
determine if the fact is accurate. For example, a sales transaction of zero
units may indicate that this is not an actual sale. A single sales transaction
for an amount that is greater than the average weekly sales for the entire
company is probably a data error.
 
Search WWH ::




Custom Search