Workload Management in the Data Warehouse - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

If we effectively measure and add all the different components, we will get a better picture that

will help us to optimize the layers of architecture and better assist in the design of the newer architec-

tures. By doing this exercise, we can determine which component has the most latencies and how to

address its workload.

Current system design limitations

There are several design limitations in the current data warehouse architectures and we discussed

them in Chapters 6 and 7. To recap:

●

Sharding. The RDBMS cannot shard data efficiently due to the ACID compliance rules. The

application of sharding concepts in data partitioning methods does not lend to scalability and

workload reduction. Partitioning often tends to increase workloads.

●

Disk architecture . The disk architecture for SAN underperforms and, due to the data architecture

and data model issues, the data distribution is highly skewed, causing poor optimization to occur

at the database layer.

●

Data layout architecture . The data layout on the disk is a major performance inhibitor and

contributes to higher workloads. Often there are large tables that are colocated in the same disk

along with their indexes. Another issue is undersizing of disk or unit of storage, causing too many

fragments of the same table and leading to excessive chaining.

●

Underutilization of CPU . In many systems, the CPU is always underutilized and this needs to be

corrected for efficient throughput and scalable thread management.

●

Underutilization of memory . A large amount of data integration often bypasses usage of memory

and relies on disk. These situations often use one or more smaller tables and two or more large

tables in a join. Efficient design of the query can result in using the smaller tables in memory

and the larger tables can be processed in a combination of memory and disk. This will reduce

several roundtrips between the database server and storage, reducing disk workloads and network

workloads.

●

Poor query design . A silent contributor to the workload situation is poorly designed queries.

Whether the queries are written by developers or generated based on the semantic layer

integration in a reporting tool, when a query complies and has a large number of joins, the

workload typically increases across the system and there are several opportunities to improve

the reduction of the workload. One example is to generate star-schema types of queries on

a third normal form (3NF) database model, albeit with some semantic layer in the database

including aggregate table and summary views. This situation will always generate a high

volume of I/O and tends to cause poor network throughput. If you add more operation to the

query and you can see the workload increase greatly on disk I/O. Another example is to execute

a query with a high number of aggregates on a database where you can utilize three or more

tables in the query.

Though many architects and system designers try to create workarounds for the current state,

since the system is already in place from a foundational perspective, the workarounds do not provide

a clear workload optimization. It does not matter whether you have a world-class infrastructure or

you have implemented your design on a commodity infrastructure, the current limitations add more

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home