Databases Reference
In-Depth Information
that have yet to be discovered and may prove to be difficult to diagnose and
even more difficult to correct.
The business value of the oldest historical data must be scrutinized. For
retail organizations, what was hot three years ago is unlikely to be an indicator
of what will be hot this year, such as in the fashion industry. Historical data
may indicate the volume of the fashionable items that are sold, but may
yield little insight into which items are more likely to be trendy. Conversely,
actuaries want all data possible from all hurricanes since data was collected.
This long-term history is useful to feed statistical models.
A reasonable goal for data quality should be set —remember, accuracy
appropriate for use. These goals can set to higher levels as the organization,
both business and IT, learn more about the problems that exist with historical
data and techniques to address them. Don't be afraid to back down from a
goal that is too lofty to start with. Focus on getting better data into the hands of
decision makers as soon as possible. This may even help to identify problems
and determine the possible value of addressing those problem(s). You learned
about data profiling is the section on data quality; historical data is an excellent
place to use these tools and techniques.
CASE STUDY: STARTING AT THE WRONG END OF THE SPECTRUM
This case study comes from a real-world scenario. Many other organizations
may have had a similar experience. This is not a recommended approach, but is
included to highlight the need to continuously review the business value of
major project choices.
The organization needed to have five years of historical transactions to
support its analysis. A new operational system was implemented two years
ago. Two different sets of work needed to be done: one to load the history from
the old system and one to develop a process to load historical and current data
from the new system. A decision was made to start with the oldest data first.
The strategy was to load the data in historical order, oldest to newest.
Once the ETL processes were built, loading the first three years was started.
The goal was to be able to include all transactions. Many problems were
discovered in the historical data, which did not meet the current data quality
objectives. While most of the data was fine, 5% of the transactions had
problems. Detailed research was required to discover how to handle these
outliers. The ETL developers dutifully adhered to their instructions to clean all
of the data. Many hours were spent researching the errors in a small number of
transactions, using the time of the most skilled and experienced resources.
As usual, this took far longer than anyone expected. It took two years to load
three years of the oldest data. Now the development to extract data from the
new operational system could begin. At this point, the data warehouse still did
not contain data that could be used to support any reporting or analysis.
(continued)
Search WWH ::




Custom Search