Databases Reference
In-Depth Information
Figure 6.13 NoSQL systems move
the query to a data node, but don't
move data to a query node. In this
example, all incoming queries arrive at
query analyzer nodes. These nodes
then forward the queries to each data
node. If they have matches, the
documents are returned to the query
node. The query won't return until all
data nodes (or a response from a
replica) have responded to the original
query request. If the data node is
down, a query can be redirected to a
replica of the data node.
Incoming
queries
Primary
data nodes
Replica
data nodes
Query
analyzer
Query
Data
node
Replica
node
Copy
data
Query
analyzer
Query
Data
node
Replica
node
Query
analyzer
Query
Data
node
Replica
node
occurs from a replica node before the update happens. This is an example of an
inconsistent read.
The best way to avoid this type of problem is to only allow reads to the same write
node after a write has been done. This logic can be added to a session or state man-
agement system at the application layer. Almost all distributed databases relax data-
base consistency rules when a large number of nodes permit writes. If your application
needs fast read/write consistency, you must deal with it at the application layer.
6.8.4
Letting the database distribute queries evenly to data nodes
In order to get high performance from queries that span multiple nodes, it's impor-
tant to separate the concerns of query evaluation from query execution. Figure 6.13
shows this structure.
The approach shown in figure 6.13 is one of moving the query to the data rather
than moving the data to the query. This is an important part of NoSQL big data strate-
gies. In this instance, moving the query is handled by the database server, and distribu-
tion of the query and waiting for all nodes to respond is the sole responsibility of the
database, not the application layer.
This approach is somewhat similar to the concept of federated search . Federated
search takes a single query and distributes it to distinct servers and then combines the
results together to give the user the impression they're searching a single system. In
some cases, these servers may be in different geographic regions. In this case, you're
sending your query to a single cluster that's not only performing search queries on a
single local cluster but also performing update and delete operations.
6.9
Case study: event log processing with Apache Flume
In this case study, you'll see how organizations use NoSQL systems to gather and
report on distributed event logs . Many organizations use NoSQL systems to process
their event log data because the datasets can be large, especially in distributed envi-
ronments. As you can imagine, each server generates hundreds of thousands of event
Search WWH ::




Custom Search