Building high-availability solutions with NoSQL - Making Sense of NoSQL

Databases Reference

In-Depth Information

99% for master node, and 99.9% for power—then the total system availability is the

product of these three numbers: 98.8% (99.9 x 99 x 99.9).

If there are single points of failure such as a master or name node, then NoSQL

systems have the ability to gracefully switch over to use a backup node without a major

service interruption. If a system can quickly recover from a failing component, it's said

to have a property of automatic failover . Automatic failover is the general property of

any service to detect a failure and switch to a redundant component. Failback is the

process of restoring a system component to its normal operation. Generally, this pro-

cess requires some data synchronization. If your systems are configured with a single

failover, you must use the probability that the failover process doesn't work in combi-

nation with the odds that the failover system fails before failback.

There are other metrics you can use besides the failure metric. If your system has

client request timeout of 30 seconds, you'll want to measure the total percentage of

client requests that fail. In such a case, a better metric might be a factor called client

yield , which is the probability of any request returning within a specified time interval.

Other metrics, such as a harvest metric, apply when you want to include partial API

results. Some services, such as federated search engines, may also return partial

results. For example, if you search 10 separate remote systems and one of the sites is

down for your call window of 30 seconds, you'd have a 90% harvest for that specific

call. Harvest is the data available divided by the total data sources.

Finding the best NoSQL service for your application may require comparing the

architecture of two different systems. The actual architecture may be hidden from you

behind a web service interface. In these cases, it might make the most sense to set up a

small pilot project to test the services under a simulated load.

When you set up a pilot project that includes stress testing, a key measurement will

be a frequency distribution chart of read and write response times. These distribu-

tions can give you hints about whether a database service will scale. A key point of this

analysis is that instead of focusing on average or mean response times, you should

look at how long the slowest 5% of your services take to return. In general, a service

with consistent response times will have higher availability than systems that some-

times have a high percentage of slow responses. Let's take a look at an example of this.

8.2.3

Apply your knowledge

Sally is evaluating two NoSQL options for a business unit that's concerned about web

page response times. Web pages are rendered with data from a key-value store. Sally

has narrowed down the field to two key-value store options; we'll call them Service A

and Service B. Sally uses JMeter , a popular performance monitoring tool, to create a

chart that has read service response distributions, as shown in figure 8.1.

When Sally looks at the data, she sees that service A has faster mean response

times. But at the 95th percentile level, they're longer than service B. Service B may

have slower average response times, but they're still within the web page load time

Search WWH ::

Custom Search

Home