Java Reference
In-Depth Information
self-adjustment and
amortized analysis
22.1
Although balanced search trees provide logarithmic worst-case running time
per operation, they have several limitations.
They require storing an extra piece of balancing information per node.
n
They are complicated to implement. As a result, insertions and dele-
tions are expensive and potentially error-prone.
n
They do not provide a win when easy inputs occur.
n
Let us examine the consequences of each of these deficiencies. First, bal-
anced search trees require an extra data member. Although in theory this
member can be as small as a single bit (as in a red-black tree), in practice the
extra data member uses an entire integer for storage in order to satisfy hard-
ware restrictions. Because computer memories are becoming huge, we must
ask whether worrying about memory is a large issue. The answer in most
cases is probably not, except that maintaining the extra data members requires
more complex code and tends to lead to longer running times and more errors.
Indeed, identifying whether the balancing information for a search tree is cor-
rect is difficult because errors lead only to an unbalanced tree. If one case is
slightly wrong, spotting the errors might be difficult. Thus, as a practical mat-
ter, algorithms that allow us to remove some complications without sacrific-
ing performance deserve serious consideration.
The real problem is
that the extra data
members add com-
plications.
Second, the worst-case, average-case, and best-case performances of a
balanced search are essentially identical. An example is a find operation for
some item X . We could reasonably expect that, not only the cost of the find
will be logarithmic, but also that if we perform an immediate second find for
X, the second access will be cheaper than the first. However, in a red-black
tree, this condition is not true. We would also expect that, if we perform an
access of X, Y, and Z in that order, a second set of accesses for the same
sequence would be easy. This assumption is important because of the 90-10
rule . As suggested by empirical studies, the 90-10 rule states that in practice
90 percent of the accesses are to 10 percent of the data items. Thus we want
easy wins for the 90 percent case, but balanced search trees do not take advan-
tage of this rule.
The 90-10 rule has been used for many years in disk I/O systems. A cache
stores in main memory the contents of some of the disk blocks. The hope is that
when a disk access is requested, the block can be found in the main memory
cache and thus save the cost of an expensive disk access. Of course, only rela-
tively few disk blocks can be stored in memory. Even so, storing the most recently
The 90-10 rule
states that 90 per-
cent of the
accesses are to
10 percent of the
data items. How-
ever, balanced
search trees do not
take advantage of
this rule.
 
 
Search WWH ::




Custom Search