Information Technology Reference
In-Depth Information
Apagefaultoccurswhenaprogramaccesses virtual memorythatisnotcurrently inthe
computer's RAM. The operating system pauses the process, reads the data from disk, and
unpauses the process when the read is complete. Since memory is handled in 4KB blocks,
the first access to a block may cause a page fault but later accesses within that same 4KB
area do not.
A page fault takes about as long as 10 million instructions could run. In other words,
spending 10,000 instructions to avoid a page fault has a 1000-to-1 payback. By carefully
organizing where in memory objects are stored, Kamp was able to eliminate many page
faults.
His message to the industry, titled “You're Doing It Wrong,” details in very technical
terms how most computer science education is still teaching algorithm design as appropri-
ate for computer systems that have not existed for 30 years.
ModernCPUshavemanycomplexitiesthatmakeperformancenon-intuitive.Linearac-
cess is significantly faster than random access. Instructions execute in parallel when they
are provably independent. The performance of a CPU drops when multiple CPUs are ac-
cessing the same part of memory.
For example, reading every element in an array in row order is significantly faster than
readingtheelementsincolumnorder.IftherowsarebiggerthantheCPU'smemorycache,
the latter is essentially going to be a cache “miss” for every read. The former would be lin-
ear memory reads, which CPUs are optimized to do quickly. Even though either way reads
the same number of bytes from memory, the former is faster.
More surprisingly, a loop that reads every element of an array takes approximately the
same amount of time to execute as a loop that reads every 16th element of the same ar-
ray. Even though the former does 1/16th of the number of operations, the number of RAM
cache misses is the same for both loops. Reading blocks of memory from RAM to the
CPU's L1 (Level 1) cache is slow and dominates the run-time of either algorithm. The fact
that the former runs faster is due to the fact that there are special instructions for sequen-
tially reading memory.
Above and beyond RAM, virtual memory, and disk issues, we also have to consider the
effect of threads, multiple CPUs trying to access the same data, locks, and mutexes. All of
these cloud the issue. You can't really tell how fast an algorithm will be without bench-
marking. O() notation becomes a general guide, not an absolute.
That said, O() notation is still useful when conversationally describing systems and
communicating with other system administrators. Therefore understanding the nomen-
clature is essential to being a system administrator today.
Search WWH ::




Custom Search