Databases Reference
In-Depth Information
FIGURE 8.4
ETL/CDC workload categories.
processing of loading, updating, or deleting data to the tables can generate a lot of workload on the
application and the database server.
The typical traits of ETL workloads are:
●
Higher I/O for longer bursts of time
●
High memory consumption
●
Higher CPU utilization
●
Large volumes of data moved across the network
●
Large transaction cycles—leading to many flushes of log files and cache in the database, creating
additional operating system-level workload
●
Depending on the software and the programming model, additional file management can be
necessary
CDC workloads exhibit the following traits:
●
Higher I/O for shorter bursts of time
●
Smaller volumes of data moved across the network
●
Larger file-based operations
●
Smaller transaction cycles
●
More file management at the operating system level
Depending on the set of operations executed in the data warehouse at any given point in time, you
can have multiple workloads being processed on the entire system. This is the underlying design con-
cept that needs to be thought about when designing a data warehouse. Database vendors like Oracle,
Teradata, and IBM have created workload management algorithms and toolsets to help users manage
their database performance. These tools aid the database and system administrators to create a set of
optimization rules based on which the database and the associated data architecture can be utilized to
create an efficient execution map for querying and loading data.