Scaling Terminology and Concepts - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

thelistofauthorizedusersistiny(say,itverifiestheuserisoneofthreepeoplepermittedto

use the system) and done in RAM, it is inconsequential compared to the database lookup.

Therefore this API call would be considered O(1), not O( n ). Such a system may run at ac-

ceptable speed for years but then the number of authorized users surges, possibly due to a

managementchangethatauthorizesthousandsofadditionalusers.SuddenlythisO( n )look-

up starts to dominate the run-time of the API call. Surprises like this happen all the time in

production systems.

While the “O” in Big O notation stands for “order,” conversationally you may hear the

word “order” used as shorthand for order of magnitude . Order of magnitude is related

but different. Order of magnitude means, essentially, how many digits are in a number that

describes a size (the magnitude of a digit literally means if it is in the 1s, 10s, 100s, or

some other place). If you work for a company with 100 employees and your friend works

at a company with 1000 employees, then your friend works at a company that is an order

of magnitude larger. You may hear phrases like an algorithm working for situations “for 1

million users but breaks down on larger order of magnitude.” You might hear references

to a system processing “on the order of 2000 queries per second,” which is a fancy way of

saying “approximately 2000” with the implication that this is a very coarse estimate.

C.3 Limitations of Big O Notation

O() notation relates to the number of operations an algorithm will perform. However, not

alloperationstakethesameamountoftime.Readingtwoadjacentrecordsfromadatabase,

if both fit into the same disk block, is essentially one disk read followed by two reads from

the RAM that caches the entire block. Two algorithms with the same O() order, one that

takes advantage of this fact and one that does not, will have extremely different perform-

ance.

In Kamp's ( 2010 ) description of the Varnish HTTP accelerator, he explains that by tak-

ing advantage of a deep understanding of how a modern OS handles disk I/O, significantly

better performance can be achieved than with naive O() comparisons. Varnish was able

to replace 12 overloaded machines running a competing software package with three ma-

chines that were 90 percent idle. He writes:

What good is an O(log 2 n ) algorithm if those operations cause page faults and slow

disk operations? For most relevant datasets an O( n ) or even an O( n 2 ) algorithm,

which avoids page faults, will run circles around it.

The improvement came from structuring his heap in such a way that nodes would avoid

generating virtual memory page faults. Kamp arranged data structures such that parent and

child nodes tended to be on the same page in virtual memory.

Search WWH ::

Custom Search

Home