Databases Reference
In-Depth Information
Translate fact values for ease of use: The business measures needed for
reports may be different from how the data is stored in source systems.
It is helpful to convert data to a common unit of measure for reporting.
If a sales transaction records the sale of one case, this may be a case of
24 individual units or perhaps 36 individual units. Converting all of the
sales to the individual units ensures consistent and meaningful reporting.
Additionally, many facts work together. For example, the number of units
sold, price, and dollar amount sold are all useful facts. Only two of these
need to be stored physically; the third can be calculated on-the-fly. To
make this as fast as possible, it is helpful to store the two facts that are
used most often.
Unravel source system logic: In many instances, the source system has
core data stored in a manner to help that system run fast and enable each
module to be relatively self-contained. This may mean that the pieces that
are needed to get a complete picture of a sales transaction may be stored
in several places or tables. To make things more challenging, there may
be no direct links between these tables, although there may be a series
of translation and lookup tables that are needed to get to the bottom of
things. This may be why the data is so hard for you to use in the source
system. The details are all there, but you need to understand the ''secret
handshakes'' in order to get to the real numbers. The ETL system can
apply this logic to provide the real sale numbers to be loaded into a fact
table. This is done once and then everyone can access these facts without
having to learn all of these details.
Perform complex calculations: Many calculations can be done when
creating a report or accessing data in an ad hoc manner. However, some
calculations may be too complex and long-running to perform on-the-fly.
The ETL system can perform these calculations and store the results as
facts, which can then be directly accessed for reporting.
Figure 10-2 shows a general data flow diagram to build the sales fact table
just described.
In addition to the functions described here, a great deal of work is also
required to address anomalies and challenges in the data. Often these chal-
lenges surface when the ETL system is being developed. No matter how
carefully the data is profiled in advance or how much detail is included in
the design, new and unexpected things will show up in the data. The goal
is to design the ETL system to be able to gracefully identify and address the
unexpected.
Load
After the extracted data has been processed, the final step is to load the
data into a database. This may be loading directly into a star schema in a
presentation server or loading data into a third normal form data warehouse.
Search WWH ::




Custom Search