Databases Reference
In-Depth Information
Workarounds—current workarounds include:
Special aggregate structures at the data warehouse or datamart.
Special network connectors between the analytical database and the data warehouse to bypass
the enterprise network and avoid clogging it.
Adding infrastructure to process data and move data faster.
Goal—the goal for creating an effective workload strategy at this layer is to minimize the amount
of work to be done by this layer by shifting the data processing to the data warehouse or the
underlying database in a shared-server architecture with the data warehouse, and restricting the
data volume to be moved across, thereby improving the overall performance of the application.
To summarize, the outbound data processing from the data warehouse generates workloads that
will require processing of data several times to the same application and processing several such
applications at the same period of time. The focus areas for workload management that impact perfor-
mance include the database, storage servers, and network.
Data warehouse inbound
There are several sources of data that need to be processed into the data warehouse and stored for
further analysis and processing. The primary application that manages the inbound processing of data
into the data warehouse is the extract, transform, and load (ETL) application. There are different vari-
ations of ETL today with extract, load and transform (ELT) and change data capture (CDC) tech-
niques and database replication.
Data warehouse processing overheads
The processing of data into the data warehouse has several overheads associated with it, depending on
the technique.
ETL—the most common processing model, the ETL design includes movement of data to an
intermediate staging area, often shared in the same storage as the data warehouse and applying
transformation rules including lookup processing. The data is often extracted from both the
staging and data warehouse areas and compared in memory for processing. Depending on
the volume of data from both ends, the process may spiral into an uncontrolled timespan
causing a major impact on the data warehouse. A second continuum of overhead persists in the
transformation of data within the data warehouse. There are several business rules to be completed
for the processing of data and these rules can add workload overheads. Similarly, constraints on
the tables within the data warehouse add to processing overheads when inserting or updating data
within the data warehouse.
Dependencies—the processing capabilities of the ETL application apart from the business rules
and the data processing complexities depend on the amount of disk, memory, cache, and processor
available to the application server. Large volumes of data mean multiple cycles of exchange
between the application server and the database, adding network traffic and waits.
Issues—large volumes of data are moved back and forth between the ETL application server
and the data warehouse. This extremely large data extraction process adds a lot of burden to the
processing database on both sides.
Search WWH ::




Custom Search