Scaling Terminology and Concepts - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

Apagefaultoccurswhenaprogramaccesses virtual memorythatisnotcurrently inthe

computer's RAM. The operating system pauses the process, reads the data from disk, and

unpauses the process when the read is complete. Since memory is handled in 4KB blocks,

the first access to a block may cause a page fault but later accesses within that same 4KB

area do not.

A page fault takes about as long as 10 million instructions could run. In other words,

spending 10,000 instructions to avoid a page fault has a 1000-to-1 payback. By carefully

organizing where in memory objects are stored, Kamp was able to eliminate many page

faults.

His message to the industry, titled “You're Doing It Wrong,” details in very technical

terms how most computer science education is still teaching algorithm design as appropri-

ate for computer systems that have not existed for 30 years.

ModernCPUshavemanycomplexitiesthatmakeperformancenon-intuitive.Linearac-

cess is significantly faster than random access. Instructions execute in parallel when they

are provably independent. The performance of a CPU drops when multiple CPUs are ac-

cessing the same part of memory.

For example, reading every element in an array in row order is significantly faster than

readingtheelementsincolumnorder.IftherowsarebiggerthantheCPU'smemorycache,

the latter is essentially going to be a cache “miss” for every read. The former would be lin-

ear memory reads, which CPUs are optimized to do quickly. Even though either way reads

the same number of bytes from memory, the former is faster.

More surprisingly, a loop that reads every element of an array takes approximately the

same amount of time to execute as a loop that reads every 16th element of the same ar-

ray. Even though the former does 1/16th of the number of operations, the number of RAM

cache misses is the same for both loops. Reading blocks of memory from RAM to the

CPU's L1 (Level 1) cache is slow and dominates the run-time of either algorithm. The fact

that the former runs faster is due to the fact that there are special instructions for sequen-

tially reading memory.

Above and beyond RAM, virtual memory, and disk issues, we also have to consider the

effect of threads, multiple CPUs trying to access the same data, locks, and mutexes. All of

these cloud the issue. You can't really tell how fast an algorithm will be without bench-

marking. O() notation becomes a general guide, not an absolute.

That said, O() notation is still useful when conversationally describing systems and

communicating with other system administrators. Therefore understanding the nomen-

clature is essential to being a system administrator today.

Search WWH ::

Custom Search

Home