Databases Reference
In-Depth Information
performance and scalability issues is the sharing of resources in the infrastructure layer and the data-
base layers. In Figure 6.1 , you can see the following layers of shared infrastructure:
Storage:
The source database is isolated in its own storage, though it is a partition of a larger storage
cluster.
The ODS (if deployed), staging, and EDW databases are all normally connected to one storage
architecture.
The physical servers for the staging and EDW are the same system and database servers.
This shared infrastructure creates scalability and performance issues:
-
The I/O is constrained in the same enterprise network pipe.
-
The I/O between the source, staging, and EDW databases needs to travel from source, to
staging, to ETL, to storage, to ETL, to EDW. A lot of system resources and the network are
dedicated to managing this dataflow and it introduces natural time latencies.
-
Timeouts on connections, slow-moving transactions, and outages on disk architecture are
other common side effects that occur in this environment.
Server hardware:
The server hardware can be a set of virtual partitions on a large physical server.
The staging and EDW databases are normally in one server.
Analytical databases are normally installed in their own server environment.
Performance
What is the limitation of sharing-everything architecture? The answer to this lies in understanding the
shared-everything architecture as shown in Figure 6.2 . In this architecture:
Memory, CPU, and system bus are shared by all programs running on the systems.
Storage is shared across the programs.
The network is a common enterprise network and not a direct tethered connection in most cases.
The architecture approach described in Figure 6.2 is not the optimal architecture to handle the
large volume of data, the processing complexities, and users defined for the data warehouse. This
architecture will scale well and perform with consistency on an OLTP or transaction processing plat-
form, since the transactions are discrete in size and occur in small bursts. The system can be tuned
and maintained for optimal performance and managed to accommodate the growth needs of a transac-
tional environment.
The data warehouse is a collection of transactional data over a period of time and by definition is a
larger data set, which will be used for a variety of query and analysis purposes by multiple users in an
enterprise.
In a shared-services environment, there are several issues that limit the performance and scalabil-
ity of the data warehouse. Let us examine the performance aspect of the data warehouse versus the
OLTP platforms in the shared environment.
In an OLTP query execution, classic transactional queries are discrete like:
Insert into Sales(date, gross_sales_amt, location_id, product_code, tax_amt)
Update customer set email='someone.somemail.com”
Delete from inventory where sku = 'q12345'
Search WWH ::




Custom Search