Database Reference
In-Depth Information
EXTRACTION, TRANSFORMATION, AND LOAD
Transforming data is generally performed as part of the preparation before
data is loaded into the data warehouse and data marts. Understanding the
business usage of this information and the specific business questions to
be analyzed and answered are the keys to determining the transforma-
tions necessary to produce the target data mart.
ETL tools are used to extract data from operational and external source sys-
tems, transform the data, and load the transformed data in a data warehouse.
The same tool is used to extract and transform the data from the warehouse
and distribute it to the data marts. When a schedule is defined for refreshing
the data, the data-extraction and -transformation schedule must be carefully
implemented so that it both meets the needs of the data warehouse and does
not adversely impact the source systems that store the original data.
Extraction is a means of replicating data through a process of selection
from one or more source databases. Extraction may or may not employ
some form of transformation. Data extraction can be accomplished through
custom-developed programs. However, the preferred method uses vendor-
supported data-extraction and -transformation tools that can be custom-
ized to address particular extraction and transformation needs as well as
use an enterprise metadata repository that will document the business
rules used to determine what data was extracted from the source systems.
Data is transformed from transaction level data into information
through several techniques: filtering, summarizing, merging, transposing,
converting, and deriving new values through mathematical and logical
formulas. These all operate on one or more discrete data fields to produce
a target result having more meaning from a decision-support perspective
than the source data. This process requires understanding the business
focus, the information needs, and the currently available sources. Issues
of data standards, domains, and business terms arise when integrating
across operational databases.
Cleansing data is based on the guideline of populating the data ware-
house with quality data, i.e., data that is consistent; that is of a known,
recognized value; and that conforms with the business definition as
expressed by the user. The cleansing operation is focused on determining
those values that violate these rules and—through either a rejection or
transformation process—bring the data into conformance.
Search WWH ::




Custom Search