Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

Figure 6.13 NoSQL systems move

the query to a data node, but don't

move data to a query node. In this

example, all incoming queries arrive at

query analyzer nodes. These nodes

then forward the queries to each data

node. If they have matches, the

documents are returned to the query

node. The query won't return until all

data nodes (or a response from a

replica) have responded to the original

query request. If the data node is

down, a query can be redirected to a

replica of the data node.

Incoming

queries

Primary

data nodes

Replica

data nodes

Query

analyzer

Query

Data

node

Replica

node

Copy

data

Query

analyzer

Query

Data

node

Replica

node

Query

analyzer

Query

Data

node

Replica

node

occurs from a replica node before the update happens. This is an example of an

inconsistent read.

The best way to avoid this type of problem is to only allow reads to the same write

node after a write has been done. This logic can be added to a session or state man-

agement system at the application layer. Almost all distributed databases relax data-

base consistency rules when a large number of nodes permit writes. If your application

needs fast read/write consistency, you must deal with it at the application layer.

6.8.4

Letting the database distribute queries evenly to data nodes

In order to get high performance from queries that span multiple nodes, it's impor-

tant to separate the concerns of query evaluation from query execution. Figure 6.13

shows this structure.

The approach shown in figure 6.13 is one of moving the query to the data rather

than moving the data to the query. This is an important part of NoSQL big data strate-

gies. In this instance, moving the query is handled by the database server, and distribu-

tion of the query and waiting for all nodes to respond is the sole responsibility of the

database, not the application layer.

This approach is somewhat similar to the concept of federated search . Federated

search takes a single query and distributes it to distinct servers and then combines the

results together to give the user the impression they're searching a single system. In

some cases, these servers may be in different geographic regions. In this case, you're

sending your query to a single cluster that's not only performing search queries on a

single local cluster but also performing update and delete operations.

6.9

Case study: event log processing with Apache Flume

In this case study, you'll see how organizations use NoSQL systems to gather and

report on distributed event logs . Many organizations use NoSQL systems to process

their event log data because the datasets can be large, especially in distributed envi-

ronments. As you can imagine, each server generates hundreds of thousands of event

Making Sense of NoSQL

Search WWH ::

Custom Search

Home