Databases Reference
In-Depth Information
8.2.1
Case study: the Amazon's S3 SLA
Now let's look at Amazon's SLA for their S3 key-value store service. Amazon's S3 is
known as the most reliable cloud-based, key-value store service available. S3 consis-
tently performs well, even when the number of reads or writes on a bucket spikes. The
system is rumored to be the largest, containing more than 1 trillion stored objects as
of the summer of 2012. That's about 150 objects for every person on the planet.
Amazon discusses several availability numbers on their website:
Annual durability design —This is the designed probability that a single key-value
item will be lost over a one-year period. Amazon claims their design durability is
99.999999999%, or 11 nines. This number is based on the probability that your
data object, which is typically stored on three hard drives, has all three drives
fail before the data can be backed up. This means that if you store 10,000 items
each year in S3 and continue to do so for 10 million years, there's about a 50%
probability you'll lose one file. Not something that you should lose much sleep
over. Note that a design is different from a service guarantee.
Annual availability design —This is a worst-case measure of how much time, over
a one-year period, you'll be unable to write new data or read your data back.
Amazon claims a worst-case availability of 99.99%, or four-nines availability for
S3. In other words, in the worse case, Amazon thinks your key-value data store
may not work for about 53 minutes per year. In reality, most users get much bet-
ter results.
Monthly SLA commitment —In the S3 SLA , Amazon will give you a 10% service
credit if your system is not up 99.9% of the time in any given month. If your data
is unavailable for 1% of the time in a month, you'll get a 25% service credit. In
practice, we haven't heard of any Amazon customer getting SLA credits.
It's also useful to read the wording of the Amazon SLA carefully. For example, it
defines an error rate as the number of S3 requests that return an internal status error
code. There's nothing in the SLA about slow response times.
In practice, most users will get S3 availability that far exceeds the minimum num-
bers in the SLA . One independent testing service found essentially 100% availability
for S3 , even under high loads over extended periods of time.
8.2.2
Predicting system availability
If you're building a NoSQL database, you need to be able predict how reliable your
database will be. You need tools to analyze the response times of database services.
Availability prediction methods calculate the overall availability of a system by look-
ing at the predicted availability of each of the dependent (single-point-of-failure) sub-
components. If each subsystem is expressed as a simple availability prediction such as
99.9, then multiplying each number together will give you an overall availability pre-
diction. For example, if you have three single points of failure—99.9% for network,
Search WWH ::




Custom Search