Databases Reference
In-Depth Information
Most systems provide slightly less than linear scalability at small scaling factors, and
the deviation from linearity becomes more obvious at higher scaling factors. In fact,
most systems eventually reach a point of maximum throughput, beyond which addi-
tional investment provides a negative return—add more workload and you'll actually
reduce the system's throughput! 3
How is this possible? Many models of scalability have been created over the years, with
varying degrees of success and realism. The scalability model that we refer to here is
based on some of the underlying mechanisms that influence systems as they scale. It is
Dr. Neil J. Gunther's Universal Scalability Law (USL). Dr. Gunther has written about
it at length in his topics, including Guerrilla Capacity Planning (Springer). We will not
go deeply into the mathematics here, but if you are interested, his topic and the training
courses offered by his company, Performance Dynamics, might be good resources for
you. 4
The short introduction to the USL is that the deviation from linear scalability can be
modeled by two factors: a portion of the work cannot be done in parallel, and a portion
of the work requires crosstalk. Modeling the first factor results in the well-known
Amdahl's Law, which causes throughput to level off. When part of the task can't be
parallelized, no matter how much you divide and conquer, the task takes at least as
long as the serial portion.
Adding the second factor—intra-node or intra-process communication—to Amdahl's
Law results in the USL. The cost of this communication depends on the number of
communication channels, which grows quadratically with respect to the number of
workers in the system. Thus, the cost eventually grows faster than the benefit, and that's
what is responsible for retrograde scalability. Figure 11-4 illustrates the three concepts
we've talked about so far: linear scaling, Amdahl scaling, and USL scaling. Most real
systems look like the USL curve.
The USL can be applied both to hardware and to software. In the hardware case, the
x-axis represents units of hardware, such as servers or CPUs; the workload, data size,
and query complexity per unit of hardware must be held constant. 5 In the software
case, the x-axis on the plot represents units of concurrency, such as users or threads;
the workload per unit of concurrency must be held constant.
3. In fact, the term “return on investment” can also be considered in light of your financial investment.
Upgrading a component to double its capacity often costs more than twice as much as the initial
investment. Although we often consider this in the real world, we'll omit it from our discussion here to
avoid complicating an already confusing topic.
4. You can also read our white paper, Forecasting MySQL Scalability with the Universal Scalability Law ,
which gives a condensed summary of the mathematics and principles at work in the USL. It is available
at http://www.percona.com .
5. In the real world, it is very difficult to define hardware scalability precisely, because it's hard to actually
hold all those variables constant as you vary the number of servers in the system.
 
Search WWH ::




Custom Search