Java Reference
In-Depth Information
8
File Processing and External
Sorting
Earlier chapters presented basic data structures and algorithms that operate on data
stored in main memory. Some applications require that large amounts of informa-
tion be stored and processed — so much information that it cannot all fit into main
memory. In that case, the information must reside on disk and be brought into main
memory selectively for processing.
You probably already realize that main memory access is much faster than ac-
cess to data stored on disk or other storage devices. The relative difference in access
times is so great that efficient disk-based programs require a different approach to
algorithm design than most programmers are used to. As a result, many program-
mers do a poor job when it comes to file processing applications.
This chapter presents the fundamental issues relating to the design of algo-
rithms and data structures for disk-based applications. 1 We begin with a descrip-
tion of the significant differences between primary memory and secondary storage.
Section 8.2 discusses the physical aspects of disk drives. Section 8.3 presents ba-
sic methods for managing buffer pools. Section 8.4 discusses the Java model for
random access to data stored on disk. Section 8.5 discusses the basic principles for
sorting collections of records too large to fit in main memory.
8.1
Primary versus Secondary Storage
Computer storage devices are typically classified into primary or main memory
and secondary or peripheral storage. Primary memory usually refers to Random
1 Computer technology changes rapidly. I provide examples of disk drive specifications and other
hardware performance numbers that are reasonably up to date as of the time when the topic was
written. When you read it, the numbers might seem out of date. However, the basic principles do not
change. The approximate ratios for time, space, and cost between memory and disk have remained
surprisingly steady for over 20 years.
265
 
Search WWH ::




Custom Search