Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

Twenty years ago, companies managed datasets that contained approximately a

million internal sales transactions, stored on a single processor in a relational data-

base. As organizations generated more data from internal and external sources, data-

sets expanded to billions and trillions of items. The amount of data made it difficult

for organizations to continue to use a single system to process this data. They had to

learn how to distribute the tasks among many processors. This is what is known as a

big data problem.

Today, using a NoSQL solution to solve your big data problems gives you some

unique ways to handle and manage your big data. By moving data to queries, using

hash rings to distribute the load, using replication to scale your reads, and allowing

the database to distribute queries evenly to your data nodes, you can manage your

data and keep your systems running fast.

What's driving the focus on solving big data problems? First, the amount of pub-

licly available information on the web has grown exponentially since the late 1990s

and is expected to continue to increase. In addition, the availability of low-cost sensors

lets organizations collect data from everything; for instance, from farms, wind tur-

bines, manufacturing plants, vehicles, and meters monitoring home energy consump-

tion. These trends make it strategically important for organizations to efficiently and

rapidly process and analyze large datasets.

Now let's look at how NoSQL systems, with their inherently horizontal scale-out

architectures, are ideal for tackling big data problems. We'll look at several strategies

that NoSQL systems use to scale horizontally on commodity hardware. We'll see how

NoSQL systems move queries to the data, not data to the queries. We'll see how they

use the hash rings to evenly distribute the data on a cluster and use replication to scale

reads. All these strategies allow NoSQL systems to distribute the workload evenly and

eliminate performance bottlenecks.

6.1

What is a big data NoSQL solution?

So what exactly is a big data problem? A big data class problem is any business problem

that's so large that it can't be easily managed using a single processor. Big data prob-

lems force you to move away from a single-processor environment toward the more

complex world of distributed computing. Though great for solving big data problems,

distributed computing environments come with their own set of challenges (see fig-

ure 6.1).

We want to stress that big data isn't the same as NoSQL. As we've defined NoSQL

in this topic, it's more than dealing with large datasets. NoSQL includes concepts and

use cases that can be managed by a single processor and have a positive impact on

agility and data quality. But we consider big data problems a primary use case for

NoSQL.

Before you assume you have a big data problem, you should consider whether you

need all of your data or a subset of your data to solve your problem. Using a statistical

sample allows you to use a subset of your data and look for patterns in the subset. The

Search WWH ::

Custom Search

Home