NoSQL concepts - Making Sense of NoSQL

Databases Reference

In-Depth Information

idle? What happens if your site doubles in size again? Do you have to continue to

rewrite your code each time this happens? Do you have to shut the system down for a

weekend while you upgrade your software?

As the number of servers grows, you find that the chance of any one server being

down remains the same, so for every server you add the chance of one part not work-

ing increases. So you think that perhaps the same process you used to split the data-

base between two systems can also be used to duplicate data to a backup or mirrored

system if the first one fails. But then you have another problem. When there are

changes to a master copy, you must also keep the backup copies in sync. You must

have a method of data replication. The time it takes to keep these databases in sync

can decrease system performance. You now need more servers to keep up!

Welcome to the world of database sharding, replication, and distributed comput-

ing. You can see that there are many questions and trade-offs to consider as your data-

base grows. NoSQL systems have been noted for having many ways to allow you to

grow your database without ever having to shut down your servers. Keeping your data-

base running when there are node or network failures is called partition tolerance —a

new concept in the NoSQL community and one that traditional database managers

struggle with.

Understanding transaction integrity and autosharding is important with respect to

how you think about the trade-offs you're faced with when building distributed sys-

tems. Though database performance, transaction integrity, and how you use memory

and autosharding are important, there are times when you must identify those system

aspects that are most important and focus on them while leaving others flexible.

Using a formal process to understand the trade-offs in your selection process will help

drive your focus toward things most important to your organization, which we turn to

next.

2.7

Understanding trade-offs with Brewer's CAP theorem

In order to make the best decision about what to do when systems fail, you need to

consider the properties of consistency and availability when working with distributed

systems over unreliable networks.

Eric Brewer first introduced the CAP theorem in 2000. The CAP theorem states that

any distributed database system can have at most two of the following three desirable

properties:

 Consistency —Having a single, up-to-date, readable version of your data available

to all clients. This isn't the same as the consistency we talked about in ACID .

Consistency here is concerned with multiple clients reading the same items

from replicated partitions and getting consistent results.

 High availability —Knowing that the distributed database will always allow data-

base clients to update items without delay. Internal communication failures

between replicated data shouldn't prevent updates.

Search WWH ::

Custom Search

Home