Java Reference
In-Depth Information
Example9.2 Calculate the expected cost for searching a list ordered by
frequency when the probabilities are defined as
1=2 i
if 0 i n 2
p i =
1=2 n
if i = n 1.
Then,
n X
n X
(i + 1)=2 i+1 =
(i=2 i ) 2:
C n
i=0
i=1
For this example, the expected number of accesses is a constant. This is
because the probability for accessing the first record is high (one half), the
second is much lower (one quarter) but still much higher than for the third
record, and so on. This shows that for some probability distributions, or-
dering the list by frequency can yield an efficient search technique.
In many search applications, real access patterns follow a rule of thumb called
the 80/20 rule. The 80/20 rule says that 80% of the record accesses are to 20%
of the records. The values of 80 and 20 are only estimates; every data access pat-
tern has its own values. However, behavior of this nature occurs surprisingly often
in practice (which explains the success of caching techniques widely used by web
browsers for speeding access to web pages, and by disk drive and CPU manufac-
turers for speeding access to data stored in slower memory; see the discussion on
buffer pools in Section 8.3). When the 80/20 rule applies, we can expect consid-
erable improvements to search performance from a list ordered by frequency of
access over standard sequential search in an unordered list.
Example9.3 The 80/20 rule is an example of a Zipf distribution. Nat-
urally occurring distributions often follow a Zipf distribution. Examples
include the observed frequency for the use of words in a natural language
such as English, and the size of the population for cities (i.e., view the
relative proportions for the populations as equivalent to the “frequency of
use”). Zipf distributions are related to the Harmonic Series defined in Equa-
tion 2.10. Define the Zipf frequency for item i in the distribution for n
records as 1=(iH n ) (see Exercise 9.4). The expected cost for the series
whose members follow this Zipf distribution will be
n
X
C n =
i=iH n = n=H n n= log e n:
i=1
When a frequency distribution follows the 80/20 rule, the average search
looks at about 10-15% of the records in a list ordered by frequency.
 
Search WWH ::




Custom Search