The New Information Management Paradigm - Big Data Imperatives

Databases Reference

In-Depth Information

Table 2-1. ( continued )

Traditional IT

Web-Scale

Applications

Big Data Analytics

Initiatives

Solution Stack

Traditional

IT enterprise

platforms, mostly

ACID compliant.

Less proprietary,

open source centric,

commodity hardware

centric (LAMP

Stack - Linux,

Apache, MySQL,

PHP)

Highly open source

centric, scalability,

performance is the key,

consistency can be

compromised (SMAQ

Stack - Storage, Map-

Reduce, Query)

Database

RDBMS platforms

MySQL

NoSQL

Compute

Proprietary

Distributed

processing, Linux

on large number of

commodity servers

Distributed processing,

data is node aware,

Linux on large number

of commodity servers

running map-reduce

jobs

Storage

Expensive SANs

Scale out commodity

NAS

Scale out commodity

NAS, Hadoop

compatible file systems

The traditional IT stack (let's define it as “database, storage, and computing”) that

worked quite well for a relatively small amount of highly valuable and highly structured data

began to show serious limitations when faced with a number of challenges. For example, the

emergence of web-enabled applications (and millions of user bases) needed cost-effective

and innovative approaches to enable distribution of computing and processing of data

across large numbers of commodity servers. Let's define this as the LAMP stack (Linux,

Apache, MySQL, and PHP). Almost every business is now a “digital business,” thus the

explosive growth in unstructured data (e.g., text, video, audio, medical images, etc.) all around

us. This is why a new stack for IT called SMAQ (storage, map-reduce and query) has emerged.

Let's discuss a few examples to fully understand the implications of the SMAQ

stack. Imagine, for example, that you are not only trying to store billions of interactions

happening in social chatter boxes but also trying to perform sophisticated analytics on

those interactions: such as sentiments expressed by people about a particular brand of

product, correlations analysis, and taking these sentiments and linking them to your

product sales across stores. Conventional databases can neither handle this kind of

scale nor do they have the ability to quickly provide answers to these kinds of questions.

Relational databases were designed to maintain transaction history (the ACID principles)

in a highly consistent manner and thus inherently they have limitations on scalability

and performance. The scale in our example necessitates that we follow a distributed data,

storage, and processing approach. The conventional storage and computing approaches

can't handle this kind of scale and complexity.

Big Data Imperatives

Search WWH ::

Custom Search

Home