Foundational data architecture patterns - Making Sense of NoSQL

Databases Reference

In-Depth Information

transactions that occurred while it was down? Who should store these transactions and

where should they be stored? These questions led to a new class of products that spe-

cialize in database replication and synchronization.

Replication is different than sharding, which we discussed in chapter 2. Sharding

stores each record on different processors but doesn't duplicate the data. In addition,

sharding allows reads and writes to be distributed to multiple systems but doesn't

increase system availability. On the other hand, replication can increase availability

and read access speeds by allowing read requests to be performed by slave systems. In

general, replication doesn't increase the performance of write operations to a data-

base. Since data has to be copied to multiple systems, it sometimes slows down total

write throughput rates. In the end, replication and sharding are independent pro-

cesses and in appropriate situations can be used together.

So what should happen if the slave systems crash? It doesn't make sense to have the

master reject all transactions, since it would render the system unavailable for writes if

any slave system crashed. If you allow the master to continue accepting updates, you'll

need a process to resync the slave system when it comes back online.

One common solution to the slave resync problem is to use a completely separate

piece of software called a reliable messaging system or message store , as shown in figure 3.9.

Reliable messaging systems accept messages even if a remote system isn't respond-

ing. When used in a master/slave configuration, these systems queue all update mes-

sages when one or more slave systems are down, and send them on when the slave

system is online, allowing all messages to be posted so that the master and slave

remain in sync.

Replication is a complex problem when one or more systems go offline, even if

only for a short period of time. Knowing exactly what information has changed and

resyncing the changed data is critical for reliability. Without some way of breaking

large databases into smaller subsets for comparison, replication becomes impractical.

This is why using consistent caching NoSQL databases (discussed in chapter 2) may

be a better solution.

NoSQL systems also need to solve the database replication problem, but unlike

relational databases, NoSQL systems need to synchronize not only tables, but other

structures as well, like graphs and documents. The technologies used to replicate

Figure 3.9 Using message stores

for reliable data replication—how

message stores can be used to

increase the reliability of the data

on each slave database, even if the

slave systems are unavailable for a

period of time. When slave systems

restart, they can access an

external message store to retrieve

the transactions they missed when

they were unavailable.

The master writes all update

transactions to a message store.

Master database

Update messages stay in the message

store till all subscribers get a copy

of the message.

Message store

Slave database

Making Sense of NoSQL

Search WWH ::

Custom Search

Home