Java Reference
In-Depth Information
315
C
OMMON
E
RROR
7.4: Underestimating the Size of a
Data Set
Programmers commonly underestimate the amount of input data that a user will
pour into an unsuspecting program. The most common problem caused by
underestimating the amount of input data results from the use of fixed-sized arrays.
Suppose you write a program to search for text in a file. You store each line in a
string, and keep an array of strings. How big do you make the array? Surely
nobody is going to challenge your program with an input that is more than 100
lines. Really? A smart grader can easily feed in the entire text of Alice in
Wonderland or War and Peace (which are available on the Internet). All of a
sudden, your program has to deal with tens or hundreds of thousands of lines.
What will it do? Will it handle the input? Will it politely reject the excess input?
Will it crash and burn?
A famous article [
1
] analyzed how several UNIX programs reacted when they
were fed large or random data sets. Sadly, about a quarter didn't do well at all,
crashing or hanging without a reasonable error message. For example, in some
versions of UNIX the tape backup program tar cannot handle file names that are
longer than 100 characters, which is a pretty unreasonable limitation. Many of
these shortcomings are caused by features of the C language that, unlike Java,
make it difficult to store strings of arbitrary size.
Q
UALITY
T
IP
7.2: Make Parallel Arrays into Arrays of
Objects
Programmers who are familiar with arrays, but unfamiliar with object-oriented
programming, sometimes distribute information across separate arrays. Here is a
typical example. A program needs to manage bank data, consisting of account
numbers and balances. Don't store the account numbers and balances in separate
arrays.
// DonȐt do this
int[] accountNumbers;
double[] balances;