Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

read and write to key-value stores or distributed filesystems like Amazon's Simple Storage

Service ( S3 ) or Hadoop Distributed File System ( HDFS ) and may not need the advanced fea-

tures of a document store or an RDBMS .

Other use cases are more demanding and need more features. Big data problems

like event log data and game data do need to store their data directly into structures

that can be queried and analyzed, so they will need different NoSQL solutions.

To be a good candidate for a general class of big data problems, NoSQL solutions

should

 Be efficient with input and output and scale linearly with growing data size.

 Be operationally efficient. Organizations can't afford to hire many people to

run the servers.

 Require that reports and analyses be performed by nonprogrammers using sim-

ple tools—not every business can afford a full-time Java programmer to write

on-demand queries.

 Meet the challenges of distributed computing, including consideration of

latency between systems and eventual node failures.

 Meet both the needs of overnight batch processing economy-of-scale and time-

critical event processing.

RDBMS can, with enough time and effort, be customized to solve some big data prob-

lems. Applications can be rewritten to distribute SQL queries to many processors and

merge the results of the queries. Databases can be redesigned to remove joins

between tables that are physically located on different nodes. SQL systems can be con-

figured to use replication and other data synchronization processes. Yet these steps all

take considerable time and money. In the long run, it might make sense to move to a

framework that has already solved many of these problems.

Original SQL systems were revolutionary with their standardized declarative lan-

guage . By declarative, we mean that a developer can “declare” what data they want and

yet not be concerned with how they get it or where they get the data from. SQL develop-

ers want and need to be isolated from the question of how to optimize a query, how to

fetch the data, and what server the data is on. Unless your database isolates you from

these questions, you lose many of the benefits of declarative systems like SQL .

NoSQL systems try to isolate the developers from the complexities of distributed

computing. They provide interfaces that allow users to tell a cluster how many nodes a

record must be read to or written from before a valid response is returned. The goal is

to keep the benefits of both declarative systems and horizontal scalability as you move

to distributed computing platforms.

If NoSQL systems really do have better horizontal scaling characteristics, you need

to be able to measure these characteristics. So let's take a look at how horizontal scal-

ability and NoSQL might be measured.

Search WWH ::

Custom Search

Home