Databases Reference
In-Depth Information
Typically, most data flows successfully through the extract, transform, and
load ( ETL) processes and all the preset checks and balances with no problems.
Often, this will represent 75% - 85% of the data. Basic validation rules should
be part of the ETL processes. For example, these rules might ensure that the
product number exists, that the zip code and state combination is valid, and
that an employee number is for a current employee. If the data does not meet
the criteria, then a warning or error is generated and processes are set in
motion to inform the staff and generate error reports. Many measurements
of data quality have a systems processing focus. Additionally, it is important
to include business criteria to measure quality. Examples of common system
measurements include the following:
Total number of rows processed
Number or percentage of rows with a warning condition
Number or percentage of rows with an error condition
Examples of common business measurements include the following:
Total number of units sold
Total dollar amount of premium collected
Change in market share compared to the previous period
The general system errors can be tracked over time to determine whether
thereisanincreaseordecreaseinthenumber of errors encountered. This
could indicate changes in the business processes or systems where the data
warehouse team has not been notified. The business measurements should
be compared to reasonable variations. If a swing in market share of greater
than 0.5% is unheard of in your industry, this benchmark can be used to flag
potential data problems.
Quality of Historical Data
Theolderthedata,themoredataqualityproblemsyouwillfind.Themost
accurate data tends to be the current data. Over time, enhancements are made
to operational systems that tighten up controls for data entry and processing.
Business processes improve to make it a standard part of each workflow
to capture critical data. New systems are developed and implemented that
include data collection to support reporting and analysis. All of these improve
the quality of the data.
Historical data does not benefit from these improvements and therefore
often has the worst quality. It is important to evaluate the business benefit
of making bad historical data available. Keep in mind that the effort to work
through data problems is often grossly underestimated. These estimates are
usually based upon known problems. There are also many other problems
Search WWH ::




Custom Search