Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

6.2

Getting linear scaling in your data center

One of the core concepts in big data is linear scaling . When a system has linear scaling,

you automatically get a proportional performance gain each time you add a new pro-

cessor to your cluster, as shown in figure 6.2.

Linear scalable architectures

provide a constant rate of

additional performance as the

number of processors

increases.

Nonscalable systems

reach a plateau of

performance where adding

new processors does

not add incremental

performance.

Performance

Number of processors

Figure 6.2 How some systems continue to add performance as more nodes

are added to the system. Performance can be a measure of read operations,

write operations, or transformations. Systems are considered linearly

scalable if the performance curve doesn't flatten out at some threshold.

Many components can cause bottlenecks in performance, so testing for

linear scalability is critical in system design.

There are additional types of scaling that might be important to you based on the type

of problem you're trying to solve. For example:

 Scaling independent transformations —Many big data problems are driven by dis-

crete transformations on individual items without interaction among the items.

These types of problems tend to be the easiest to solve: simply add a new node

to your cluster. Image transformation is a good example of this.

 Scaling reads —In order to keep your read latency low, you must replicate your

data on multiple servers and move the servers as close to the users as possible

using tools like content distribution networks ( CDN s) . CDN s keep copies of data in

each geographic region so that the distance that data moves over a network can

be minimized. The challenge is that the more servers you have and the farther

apart they are, the more difficult it is to keep them in sync.

 Scaling totals —Scaling totals involves how quickly you can perform simple math

functions (count, sum, average) on large quantities of data. This type of scaling

is most often addressed by OLAP systems by precalculating subset totals in struc-

tures called aggregates so that most of the math is already done. For example, if

you have the total daily hits for a website, the weekly total is the sum of each day

in a particular week.

 Scaling writes —In order to avoid blocking writes, it's best to have multiple serv-

ers that accept writes and never block each other. To make reads of these writes

Making Sense of NoSQL

Search WWH ::

Custom Search

Home