Databases Reference
In-Depth Information
Transactional systems
Transactional systems are the source databases from any application—web or client server—and are
very small in size and have a lower degree of complexity of data processing. The data life cycle is not
more than a day or a week at a maximum. Data is copied from this system to downstream databases
including the ODS, staging area, and data warehouse. The databases here have very minimal latencies
and are always tuned to perform extremely fast writes.
Operational data store
The ODS is an optional architecture component and is used for operational reporting purposes. The
data collected in the ODS varies in weeks to months, sometimes even a year. The ODS is an aggrega-
tion of the data from the transactional systems. The data model undergoes minimal changes, but the
complexity of the ODS is inherited from the integration of multiple data sources into one database.
The ODS serves two operations: loading data from transactional systems, and running report queries
from users. The mixture of two different types of operations causes data to be moved from disk to
memory and back to disk in a repeated manner. This causes a lot of performance issues on the overall
ODS. There are several potential possibilities to reengineer the ODS, which we will discuss later in
this chapter. The ODS is one key area for the modernization exercise.
Staging area
The staging area is another database that is built and deployed in every data warehouse. The purpose
of the staging area is to gather data for data quality and data preparation exercises for loading data
into the data warehouse. While the data is scrubbed on a daily or every other day basis, maintain-
ing the volumes at lower levels compared to the ODS or data warehouse, there is a lot of activity
on the staging area, which often shares the same disk space on the storage layer with the rest of the
data warehouse tables. This activity includes data quality and data preparation for loading to the data
warehouse, apart from any specific data integration prior to data warehouse loading. These activities,
while not creating any overhead from data volume, cause a lot of overhead from the data complexity
perspective. The current dilemma for architects is whether to create a separate storage area for the
staging area or to completely remove the staging area from the data warehouse data infrastructure,
and relocate the same to the ODS or a separate preprocessing database. The disk activity generated
by the staging area activities in terms of overloading the data warehouse, competes with the same
memory, processor, disk, and network as the data warehouse. This causes several performance issues
that database and system administrators have tuned for years. The platform is irrelevant, as the prob-
lem manifests across the board.
Data warehouse
The biggest and most complex data structures that has been built on the RDBMS platform is the data
store for all the data deemed as an enterprise asset. The data warehouse contains not only current
data, but also past history of several years online and offline. The data warehouse is the source of data
for all downstream systems including reporting platforms, analytics, dashboards, and specialty appli-
cations like CRM, compliance reporting, and financial reporting.
Search WWH ::




Custom Search