Building high-availability solutions with NoSQL - Making Sense of NoSQL

Databases Reference

In-Depth Information

8.2.1

Case study: the Amazon's S3 SLA

Now let's look at Amazon's SLA for their S3 key-value store service. Amazon's S3 is

known as the most reliable cloud-based, key-value store service available. S3 consis-

tently performs well, even when the number of reads or writes on a bucket spikes. The

system is rumored to be the largest, containing more than 1 trillion stored objects as

of the summer of 2012. That's about 150 objects for every person on the planet.

Amazon discusses several availability numbers on their website:

 Annual durability design —This is the designed probability that a single key-value

item will be lost over a one-year period. Amazon claims their design durability is

99.999999999%, or 11 nines. This number is based on the probability that your

data object, which is typically stored on three hard drives, has all three drives

fail before the data can be backed up. This means that if you store 10,000 items

each year in S3 and continue to do so for 10 million years, there's about a 50%

probability you'll lose one file. Not something that you should lose much sleep

over. Note that a design is different from a service guarantee.

 Annual availability design —This is a worst-case measure of how much time, over

a one-year period, you'll be unable to write new data or read your data back.

Amazon claims a worst-case availability of 99.99%, or four-nines availability for

S3. In other words, in the worse case, Amazon thinks your key-value data store

may not work for about 53 minutes per year. In reality, most users get much bet-

ter results.

 Monthly SLA commitment —In the S3 SLA , Amazon will give you a 10% service

credit if your system is not up 99.9% of the time in any given month. If your data

is unavailable for 1% of the time in a month, you'll get a 25% service credit. In

practice, we haven't heard of any Amazon customer getting SLA credits.

It's also useful to read the wording of the Amazon SLA carefully. For example, it

defines an error rate as the number of S3 requests that return an internal status error

code. There's nothing in the SLA about slow response times.

In practice, most users will get S3 availability that far exceeds the minimum num-

bers in the SLA . One independent testing service found essentially 100% availability

for S3 , even under high loads over extended periods of time.

8.2.2

Predicting system availability

If you're building a NoSQL database, you need to be able predict how reliable your

database will be. You need tools to analyze the response times of database services.

Availability prediction methods calculate the overall availability of a system by look-

ing at the predicted availability of each of the dependent (single-point-of-failure) sub-

components. If each subsystem is expressed as a simple availability prediction such as

99.9, then multiplying each number together will give you an overall availability pre-

diction. For example, if you have three single points of failure—99.9% for network,

Search WWH ::

Custom Search

Home