Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

Master-slave

Peer-to-peer

Used only

if primary

master fails

Requests

Node

Master

Standby

m a ster

Node

Figure 6.7 Master-slave versus peer-to-peer—the panel on the left

illustrates a master-slave configuration where all incoming database

requests (reads or writes) are sent to a single master node and

redistributed from there. The master node is called the NameNode in

Hadoop. This node keeps a database of all the other nodes in the

cluster and the rules for distributing requests to each node. The panel

on the right shows how the peer-to-peer model stores all the information

about the cluster on each node in the cluster. If any node crashes, the

other nodes can take over and processing can continue.

Let's look at the trade-offs. With a master-slave distribution model, the role of manag-

ing the cluster is done on a single master node. This node can run on specialized

hardware such as RAID drives to lower the probability that it crashes. The cluster can

also be configured with a standby master that's continually updated from the master

node. The challenge with this option is that it's difficult to test the standby master

without jeopardizing the health of the cluster. Failure of the standby master to take

over from the master node is a real concern for high-availability operations.

Peer-to-peer systems distribute the responsibility of the master to each node in the

cluster. In this situation, testing is much easier since you can remove any node in the

cluster and the other nodes will continue to function. The disadvantage of peer-to-

peer networks is that there's an increased complexity and communication overhead

that must occur for all nodes to be kept up to date with the cluster status.

The initial versions of Hadoop (frequently referred to as the 1.x versions) were

designed to use a master-slave architecture with the NameNode of a cluster being

responsible for managing the status of the cluster. NameNodes usually don't deal with

any MapReduce data themselves. Their job is to manage and distribute queries to the

correct nodes on the cluster. Hadoop 2.x versions are designed to remove single

points of failure from a Hadoop cluster.

Using the right distribution model will depend on your business requirements: if

high availability is a concern, a peer-to-peer network might be the best solution. If you

can manage your big data using batch jobs that run in off hours, then the simpler

master-slave model might be best. As we move to the next section, you'll see how Map-

Reduce systems can be used in multiprocessor configurations to process your big data.

Making Sense of NoSQL

Search WWH ::

Custom Search

Home