Databases Reference
In-Depth Information
application. The junctures listed thus far, and their potential data quality
assessments, are the following:
• Extract—Ater the Extract function retrieves rows from the source
system, use an alternate data retrieval method to profile the data that
should have been extracted. Compare the data profile to a compa-
rable profile of the Extract file.
• Transform—Ater the Transform function writes the Update/Delete
and Insert files:
• Verify that the Entity Keys in the Update/Delete ile, but not in
the Insert file, are in the data warehouse but not in the Extract file.
• Verify that the Entity Keys in the Insert ile, but not in the Update/
Delete file, are in the Extract file but not in the data warehouse.
• Verify that the Instance Key of an Entity Key in the Insert ile is
greater than the maximum Instance Key for the same Entity Key
in the data warehouse.
• Load—Ater the Load function has updated the data warehouse
from the Update/Delete and Insert files:
• Verify that for every Entity Key, exactly one Instance Key applies
to each time frame, so that no time frame exists without an
Instance Key, which will cause confusion as to which Type 2 time
variant dimension row applies to a moment in the past. If the
answer is zero Type 2 t ime variant dimension rows, that's the
wrong answer.
• Verify that for every Entity Key, only one Instance Key applies
to each time frame, so that no Instance Keys overlap, which will
cause confusion as to which Type 2 time variant dimension row
applies to a moment in the past. If the answer is two Type 2 time
variant dimension rows, that's the wrong answer.
• Verify that for every Entity Key, the row in the Type 1 dimension
table is identical to the most recent row in the Type 2 dimension
table.
Data quality is a s eldom understood, and less frequently implemented,
function of an ETL application. The data quality assessments listed above
are among the most basic and rudimentary.
Additional data quality assessments are available in Chapter 8 o f
Building and Maintaining a Data Warehouse . If a time variant data ware-
house experiences data quality issues, the potential for data corruption is
Search WWH ::




Custom Search