Databases Reference
In-Depth Information
as input to the rest of the ETL processing. This provides the raw input data
to be processed, but not the other data needed for ETL processing. Where the
extracted data is stored is referred to as the staging data store . The definition of
this staging data store in the top-down architecture follows.
1. What data will be stored here (reference and/or transaction data)? This
layer contains all of the raw input data, including both transaction and
reference data.
2. What is the primary purpose of keeping the data here? The data is stored
here as input for ETL processing. This may also be used as a source for the
raw input data if the source system does not retain history. This may also
serve the purpose of preserving data as provided by the source system.
This supports reprocessing of the data if errors are found in the business
rules or there are changes to how the data is to be processed. How will
the data be structured? The data is structuredasitisprovided.Thismay
be a copy of the source system structures or an extract file.
3. What is the persistence of the data or how much history will be stored?
The amount of data that must be retained in the data staging area depends
upon the characteristics of the underlying source systems. If the data can
be easily provided again, then there is limited value in keeping another
copy of the data here. However, if the data rolls off the source system
database and is no longer available, there may be good reason to keep
a copy of the data here. Even if the raw input data needs to be retained
for possible future use, it may be more cost effective to archive the data,
rather than keep it online. Data that has been archived can be retrieved
for reprocessing if necessary.
4. Who will be able to use the data here? The data staging area is to be used
by the data warehouse processing, either programs or utilities.
5. What type of data access will be available? The access will be in a batch
mode, when the data warehouse processes are run. This may be monthly,
daily, or perhaps more frequently if the business requires it.
There are no substantial differences between the two data architecture approaches for
data extraction. More definition is put around this area with the top-down approach .
Prepare the Data: The Data Warehouse
A lot of work is required to prepare the data for reporting and analysis. The
top-down approach recommends creating a collection of the historical data as
the basis for publishing and using all data. This collection of data is called the
data warehouse . Data is cleansed, validated, and integrated prior to being loaded
into this data warehouse. As noted in Figure 9-4, ETL data stores are needed
to support the processing of the raw data in the staging data store for loading
Search WWH ::




Custom Search