Databases Reference
In-Depth Information
processing, other characteristics such as streaming data and different data types also add
to the complexity of database processing.
Table 4-2. Big Data Scale: Volume, Velocity, Variety Impact
Big Data Scale: Volume, Velocity and Variety Impact
Higher Impact
Lower Impact
BI Workloads
OLTP Workloads
Complex Data Structures (Variety)
Simple Data Structures
Many Table Join Operations
Fewer Table Join Operations
High Data Growth Rate
Slow Data Growth Rate
Streaming Data
Mostly Batch Oriented data
Data volume has the biggest impact on BI and analytic workloads as they read large
portions of data at one time and join large multiple tables together. In contrast, OLTP
workloads are less affected by data volume because the transactions and queries are
predictable and involve small amount of data whether recording transaction or fetching
records at a time.
There is also a growing demand from businesses to run queries faster, to run more
number of queries simultaneously, and to run queries against larger data sets.
To solve the volume-related issues, data management practitioners extensively use
compression techniques and indexing. However, there are design-related challenges,
as different attributes will have different values: it is therefore not possible to optimize the
compression beyond a certain point. Indexes are means to improve query performance
but they also introduce additional overheads. Typically, the indexes take up as much
space as the data itself, in effect doubling or more. When you add indexes as well as other
constructs such as materialized views, the data store size increases as much as eight
times the size of the raw data. On the other hand, if you start removing the indexes you
see degrading query performance. In effect, while designing the data store, you need to
balance the number of indexes.
The structure and complexity of the data can be as important as the raw data volumes.
Narrow and deep structures, like simple tables with a small number of columns but many
rows, are easier to manage than many tables of varying attributes and row counts. The
number of tables and relationships is as important as the amount of data stored. Large
numbers of schema objects imply more complex joins and more difficulty distributing
the data so that it can be joined efficiently. Variety of data imposes additional constraints
on the data store. If your data processing logic needs a combination of structured data
and unstructured data, you will now have to design queries to cater to different type of
outputs. The resulting data set needs to merge together to provide the expected output.
Unstructured data needs parsing, tagging, and filtering techniques to be applied to the raw
data; you have to write programs to do these kinds of jobs. The data store itself need to be
prepared to accept structured data and unstructured data. These aspects of big data scale
drive query complexity, which can result in poor optimizations and lots of data movement.
Search WWH ::




Custom Search