Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

processing, other characteristics such as streaming data and different data types also add

to the complexity of database processing.

Table 4-2. Big Data Scale: Volume, Velocity, Variety Impact

Big Data Scale: Volume, Velocity and Variety Impact

Higher Impact

Lower Impact

BI Workloads

OLTP Workloads

Complex Data Structures (Variety)

Simple Data Structures

Many Table Join Operations

Fewer Table Join Operations

High Data Growth Rate

Slow Data Growth Rate

Streaming Data

Mostly Batch Oriented data

Data volume has the biggest impact on BI and analytic workloads as they read large

portions of data at one time and join large multiple tables together. In contrast, OLTP

workloads are less affected by data volume because the transactions and queries are

predictable and involve small amount of data whether recording transaction or fetching

records at a time.

There is also a growing demand from businesses to run queries faster, to run more

number of queries simultaneously, and to run queries against larger data sets.

To solve the volume-related issues, data management practitioners extensively use

compression techniques and indexing. However, there are design-related challenges,

as different attributes will have different values: it is therefore not possible to optimize the

compression beyond a certain point. Indexes are means to improve query performance

but they also introduce additional overheads. Typically, the indexes take up as much

space as the data itself, in effect doubling or more. When you add indexes as well as other

constructs such as materialized views, the data store size increases as much as eight

times the size of the raw data. On the other hand, if you start removing the indexes you

see degrading query performance. In effect, while designing the data store, you need to

balance the number of indexes.

The structure and complexity of the data can be as important as the raw data volumes.

Narrow and deep structures, like simple tables with a small number of columns but many

rows, are easier to manage than many tables of varying attributes and row counts. The

number of tables and relationships is as important as the amount of data stored. Large

numbers of schema objects imply more complex joins and more difficulty distributing

the data so that it can be joined efficiently. Variety of data imposes additional constraints

on the data store. If your data processing logic needs a combination of structured data

and unstructured data, you will now have to design queries to cater to different type of

outputs. The resulting data set needs to merge together to provide the expected output.

Unstructured data needs parsing, tagging, and filtering techniques to be applied to the raw

data; you have to write programs to do these kinds of jobs. The data store itself need to be

prepared to accept structured data and unstructured data. These aspects of big data scale

drive query complexity, which can result in poor optimizations and lots of data movement.

Big Data Imperatives

Search WWH ::

Custom Search

Home