Databases Reference
In-Depth Information
6.2
Getting linear scaling in your data center
One of the core concepts in big data is linear scaling . When a system has linear scaling,
you automatically get a proportional performance gain each time you add a new pro-
cessor to your cluster, as shown in figure 6.2.
Linear scalable architectures
provide a constant rate of
additional performance as the
number of processors
increases.
Nonscalable systems
reach a plateau of
performance where adding
new processors does
not add incremental
performance.
Performance
Number of processors
Figure 6.2 How some systems continue to add performance as more nodes
are added to the system. Performance can be a measure of read operations,
write operations, or transformations. Systems are considered linearly
scalable if the performance curve doesn't flatten out at some threshold.
Many components can cause bottlenecks in performance, so testing for
linear scalability is critical in system design.
There are additional types of scaling that might be important to you based on the type
of problem you're trying to solve. For example:
Scaling independent transformations —Many big data problems are driven by dis-
crete transformations on individual items without interaction among the items.
These types of problems tend to be the easiest to solve: simply add a new node
to your cluster. Image transformation is a good example of this.
Scaling reads —In order to keep your read latency low, you must replicate your
data on multiple servers and move the servers as close to the users as possible
using tools like content distribution networks ( CDN s) . CDN s keep copies of data in
each geographic region so that the distance that data moves over a network can
be minimized. The challenge is that the more servers you have and the farther
apart they are, the more difficult it is to keep them in sync.
Scaling totals —Scaling totals involves how quickly you can perform simple math
functions (count, sum, average) on large quantities of data. This type of scaling
is most often addressed by OLAP systems by precalculating subset totals in struc-
tures called aggregates so that most of the math is already done. For example, if
you have the total daily hits for a website, the weekly total is the sum of each day
in a particular week.
Scaling writes —In order to avoid blocking writes, it's best to have multiple serv-
ers that accept writes and never block each other. To make reads of these writes
Search WWH ::




Custom Search