Database Reference
In-Depth Information
Figure 4.1 A typical data warehouse architecture.
4.2 Data Warehousing
Data warehouses are large data repositories that support the decision-making
process. Figure 4.1 shows a typical multi-tier data warehousing architecture. We
can see that data coming fromheterogeneous data sources, after a staging process
that acts as a kind of buffer, pass through a process known as ETL, standing
for extraction, transformation, and loading .The extraction phase gathers data
from the data sources. These may be operational databases, but also files in
various formats, which may be internal or external to the organization. The
transformation phase modifies the data from the format of the data sources to
that of the warehouse. This includes several aspects: cleaning, which removes
errors in the data and converts them into a standardized format; integration,
which reconciles data from different data sources, both at the schema and at
the data level; and aggregation, which summarizes the data obtained from data
sources according to the level of detail (granularity) of the data warehouse.
Finally, the loading phase feeds the data warehouse with the transformed data.
This also includes refreshing the data warehouse, that is, propagating updates
from the data sources to the data warehouse at a specified frequency in order to
provide up-to-date data for the decision-making process. We will see later that
Search WWH ::




Custom Search