Databases Reference
In-Depth Information
Twenty years ago, companies managed datasets that contained approximately a
million internal sales transactions, stored on a single processor in a relational data-
base. As organizations generated more data from internal and external sources, data-
sets expanded to billions and trillions of items. The amount of data made it difficult
for organizations to continue to use a single system to process this data. They had to
learn how to distribute the tasks among many processors. This is what is known as a
big data problem.
Today, using a NoSQL solution to solve your big data problems gives you some
unique ways to handle and manage your big data. By moving data to queries, using
hash rings to distribute the load, using replication to scale your reads, and allowing
the database to distribute queries evenly to your data nodes, you can manage your
data and keep your systems running fast.
What's driving the focus on solving big data problems? First, the amount of pub-
licly available information on the web has grown exponentially since the late 1990s
and is expected to continue to increase. In addition, the availability of low-cost sensors
lets organizations collect data from everything; for instance, from farms, wind tur-
bines, manufacturing plants, vehicles, and meters monitoring home energy consump-
tion. These trends make it strategically important for organizations to efficiently and
rapidly process and analyze large datasets.
Now let's look at how NoSQL systems, with their inherently horizontal scale-out
architectures, are ideal for tackling big data problems. We'll look at several strategies
that NoSQL systems use to scale horizontally on commodity hardware. We'll see how
NoSQL systems move queries to the data, not data to the queries. We'll see how they
use the hash rings to evenly distribute the data on a cluster and use replication to scale
reads. All these strategies allow NoSQL systems to distribute the workload evenly and
eliminate performance bottlenecks.
6.1
What is a big data NoSQL solution?
So what exactly is a big data problem? A big data class problem is any business problem
that's so large that it can't be easily managed using a single processor. Big data prob-
lems force you to move away from a single-processor environment toward the more
complex world of distributed computing. Though great for solving big data problems,
distributed computing environments come with their own set of challenges (see fig-
ure 6.1).
We want to stress that big data isn't the same as NoSQL. As we've defined NoSQL
in this topic, it's more than dealing with large datasets. NoSQL includes concepts and
use cases that can be managed by a single processor and have a positive impact on
agility and data quality. But we consider big data problems a primary use case for
NoSQL.
Before you assume you have a big data problem, you should consider whether you
need all of your data or a subset of your data to solve your problem. Using a statistical
sample allows you to use a subset of your data and look for patterns in the subset. The
Search WWH ::




Custom Search