Database Reference
In-Depth Information
Data are shared between applications or systems, trying to integrate them, giving at least
two applications the same picture of the world. This data sharing was mostly addressed
by mechanisms similar to what we now call ETL. Data warehouse environments face
the same challenge with the additional burden that they not only have to exchange, but
to integrate, rearrange and consolidate data over many operational systems, thereby
providing a new unified information base for business intelligence. Additionally, the data
volume in data warehouse environments tends to be very large.
24.5.1 What Happens During the ETL Process
During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications. Very often, it is not possible to
identify the specific subset of interest, therefore more data than necessary has to be
extracted, so the identification of the relevant data will be done at a later point in time.
Depending on the source system's capabilities (for example, operating system
resources), some transformations may take place during this extraction process. The size
of the extracted data varies from hundreds of kilobytes up to gigabytes, depending on the
source system and the business situation. The same is true for the time difference between
two (logically) identical extractions: the time span may vary between days/hours
and minutes to near real-time. Web server log files for example can easily become hundreds
of megabytes in a very short period of time. After extracting data, it has to be physically
transported to the target system or an intermediate system for further processing. Depending
on the chosen mode of transportation, some transformations can be done during this
process, too. For example, an SQL statement that directly accesses a remote target
through a gateway can concatenate two or more columns as part of the Select statement.
After transportation to the target system, the data may undergo transformation into
the desired formats for the target system. Once this process is completed, the data is
loaded into the data warehouse.
24.5.2 ETL Tools
Designing and maintaining the ETL process is often considered one of the most difficult
and resource-intensive portions of a data warehouse project. Many data warehousing
projects use ETL tools to manage this process.
Oracle Tools
Oracle Warehouse Builder (OWB) provides ETL capabilities and takes advantage of
inherent database abilities. Other data warehouse builders create their own ETL tools and
processes, either inside or outside the database.
Besides the support of extraction, transformation, and loading, there are some
other tasks that are important for a successful ETL implementation as part of the daily
operations of the data warehouse and its support for further enhancements. The OWB
is quite a sophisticated component that facilitates the construction simple as well
as complex data warehouses, the population of them via the ETL process, and the
management of them.
 
Search WWH ::




Custom Search