Databases Reference
In-Depth Information
consistent, the servers should be as close to each other as possible. Users that
write their own data should always be able to read it back in a consistent state.
Scaling availability —Duplicate the writes onto multiple servers in data centers in
distinct geographic regions. If one data center experiences an outage, the other
data centers can supply the data. Scaling availability keeps replica copies in sync
and automates the switchover if one system fails.
Figure 6.3 is an example of linear write scalability analysis done by Netflix using an
Amazon Elastic Compute Cloud ( EC2 ) system.
Client writes/s by node count (replication factor = 3)
1200000
1099837
1000000
Used 288 of m1.xlarge
4 CPU, 15 GB RAM,
8 ECU
Cassandra 0.86
Benchmark config only
exisited for about 1hr
800000
600000
537172
400000
366828
200000
174373
0
0
50
100
150
200
250
300
350
Figure 6.3 An example of a Cassandra cluster that has been used to simulate a large
number of writes per second on multiple nodes. The start of the simulation shows around
50 nodes accepting 170,000 writes per second. As the cluster grows to over 300
nodes, the system can accept over a million writes per second. The simulation was done
on a rented cluster of Amazon Elastic Compute Cloud (EC2). The ability to “rent” CPUs
on an hourly basis has made it easy to test a NoSQL system for linear scalability.
(Reference: Netflix)
The ability to scale linearly is critical to cost-effective big data processing. But the abil-
ity to read and write single records isn't the only concern of many business problems.
Systems must also be able to effectively perform queries on your data, as you'll see
next.
6.3
Understanding linear scalability and expressivity
What's the relationship between scalability and your ability to perform complex que-
ries on your data? As we mentioned earlier, linear scalability is the ability to get a con-
sistent amount of performance improvement as you add additional processors to your
cluster. Expressivity is the ability to perform fine-grained queries on individual ele-
ments of your dataset.
Understanding how well each NoSQL technology performs in terms of scalability
and expressivity is necessary when you're selecting a NoSQL solution. To select the
right system, you'll need to identify the scalability and expressivity requirements of
your system and then make sure the system that you select meets both of these crite-
ria. Scalability and expressivity can be difficult to quantify, and vendor claims may not
 
Search WWH ::




Custom Search