Databases Reference
In-Depth Information
If we effectively measure and add all the different components, we will get a better picture that
will help us to optimize the layers of architecture and better assist in the design of the newer architec-
tures. By doing this exercise, we can determine which component has the most latencies and how to
address its workload.
Current system design limitations
There are several design limitations in the current data warehouse architectures and we discussed
them in Chapters 6 and 7. To recap:
Sharding. The RDBMS cannot shard data efficiently due to the ACID compliance rules. The
application of sharding concepts in data partitioning methods does not lend to scalability and
workload reduction. Partitioning often tends to increase workloads.
Disk architecture . The disk architecture for SAN underperforms and, due to the data architecture
and data model issues, the data distribution is highly skewed, causing poor optimization to occur
at the database layer.
Data layout architecture . The data layout on the disk is a major performance inhibitor and
contributes to higher workloads. Often there are large tables that are colocated in the same disk
along with their indexes. Another issue is undersizing of disk or unit of storage, causing too many
fragments of the same table and leading to excessive chaining.
Underutilization of CPU . In many systems, the CPU is always underutilized and this needs to be
corrected for efficient throughput and scalable thread management.
Underutilization of memory . A large amount of data integration often bypasses usage of memory
and relies on disk. These situations often use one or more smaller tables and two or more large
tables in a join. Efficient design of the query can result in using the smaller tables in memory
and the larger tables can be processed in a combination of memory and disk. This will reduce
several roundtrips between the database server and storage, reducing disk workloads and network
workloads.
Poor query design . A silent contributor to the workload situation is poorly designed queries.
Whether the queries are written by developers or generated based on the semantic layer
integration in a reporting tool, when a query complies and has a large number of joins, the
workload typically increases across the system and there are several opportunities to improve
the reduction of the workload. One example is to generate star-schema types of queries on
a third normal form (3NF) database model, albeit with some semantic layer in the database
including aggregate table and summary views. This situation will always generate a high
volume of I/O and tends to cause poor network throughput. If you add more operation to the
query and you can see the workload increase greatly on disk I/O. Another example is to execute
a query with a high number of aggregates on a database where you can utilize three or more
tables in the query.
Though many architects and system designers try to create workarounds for the current state,
since the system is already in place from a foundational perspective, the workarounds do not provide
a clear workload optimization. It does not matter whether you have a world-class infrastructure or
you have implemented your design on a commodity infrastructure, the current limitations add more
 
Search WWH ::




Custom Search