Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

Consistency: Consistency is a critical design consideration. Immediate consistency

means that as soon as data has been updated, any other query will see the updated value.

Eventual consistency means that changes to data will not be uniformly visible to all

queries for some period of time. Some queries may see the earlier value while others see

the new or updated value.

Consistency is important to most OLTP systems because inconsistent query results

could lead to serious problems. For example, if a bank account is emptied by one

withdrawal, it shouldn't be possible to withdraw more funds. If the banking withdrawal

application is designed for eventual consistency you can very well imagine the

consequences - it might be possible for two simultaneous withdrawals, each taking the

full balance out of the account, not a desirable state for the bank.

There are cases where immediate consistency is not critical and eventual consistency

is actually a desirable state, as it offers better performance and scalability characteristics,

particularly for large scale systems running in a distributed hardware environment like

the cloud. For example, in many consumer-facing web applications like e-commerce

applications, where the listing of products needs to be consistent with the actual inventory,

you can still go ahead with the transaction; later on, products listing can be made consistent

with products availability.

Updatability: Data may be changeable or it may be permanent. If an application

never updates or deletes data then it is possible to optimize the database design and

improve both performance and scalability.

Event streams, such as log data or web tracking activity are examples of data that

by its nature does not have updates. Events generate data, systems capture the data and

analyze the implications, and the data itself does not undergo any change at all. Outside

of event streams, the most common scenarios for write-once data are in BI and analytics

workloads, where data is usually loaded once and queried many times thereafter.

A number of BI and analytic databases assume that updates and deletes are rare and

use very simple mechanisms to control them. Putting a workload with a constant stream of

updates and deletes onto one of these databases will lead to query performance problems

because that workload is not part of their primary design. The same applies to some NoSQL

data stores that have been designed as append-only data stores to handle extremely high

rates of data loading. They can write large volumes of data quickly, but once written the

data can't be changed. Instead, it must be copied, modified, and written a second time.

Data Types: Relational databases operate on tables of data, but not all data is

tabular. Data structures can be hierarchies, networks, documents, or even nested inside

one another. If the data is hierarchical then it must be flattened into different tables before

it can be stored in a relational database. This isn't difficult, but it creates a challenge when

mapping between the database and a program that needs to retrieve the data.

Response Time: Response time is measured when you execute a query or

transaction and the time it takes to return the result of the operation. The challenge with

fast response time for queries is the volume of data that must be read, which is itself also

a function of the complexity of the query. Many solutions, like OLAP databases, focus on

pre-staging data so the query can simply read summarized or pre-calculated results.

If a query requires no joins it can be very fast, which is how some NoSQL databases satisfy

extremely low latency queries.

Response time for writes is similar, with the added mechanism of eventual

consistency. If a database is eventually consistent, it's possible to provide a higher degree

Big Data Imperatives

Search WWH ::

Custom Search

Home