Databases Reference
In-Depth Information
idle? What happens if your site doubles in size again? Do you have to continue to
rewrite your code each time this happens? Do you have to shut the system down for a
weekend while you upgrade your software?
As the number of servers grows, you find that the chance of any one server being
down remains the same, so for every server you add the chance of one part not work-
ing increases. So you think that perhaps the same process you used to split the data-
base between two systems can also be used to duplicate data to a backup or mirrored
system if the first one fails. But then you have another problem. When there are
changes to a master copy, you must also keep the backup copies in sync. You must
have a method of data replication. The time it takes to keep these databases in sync
can decrease system performance. You now need more servers to keep up!
Welcome to the world of database sharding, replication, and distributed comput-
ing. You can see that there are many questions and trade-offs to consider as your data-
base grows. NoSQL systems have been noted for having many ways to allow you to
grow your database without ever having to shut down your servers. Keeping your data-
base running when there are node or network failures is called partition tolerance —a
new concept in the NoSQL community and one that traditional database managers
struggle with.
Understanding transaction integrity and autosharding is important with respect to
how you think about the trade-offs you're faced with when building distributed sys-
tems. Though database performance, transaction integrity, and how you use memory
and autosharding are important, there are times when you must identify those system
aspects that are most important and focus on them while leaving others flexible.
Using a formal process to understand the trade-offs in your selection process will help
drive your focus toward things most important to your organization, which we turn to
next.
2.7
Understanding trade-offs with Brewer's CAP theorem
In order to make the best decision about what to do when systems fail, you need to
consider the properties of consistency and availability when working with distributed
systems over unreliable networks.
Eric Brewer first introduced the CAP theorem in 2000. The CAP theorem states that
any distributed database system can have at most two of the following three desirable
properties:
Consistency —Having a single, up-to-date, readable version of your data available
to all clients. This isn't the same as the consistency we talked about in ACID .
Consistency here is concerned with multiple clients reading the same items
from replicated partitions and getting consistent results.
High availability —Knowing that the distributed database will always allow data-
base clients to update items without delay. Internal communication failures
between replicated data shouldn't prevent updates.
Search WWH ::




Custom Search