Graphics Reference
In-Depth Information
Under this list representation, a list of three small elements might require
almost the same amount of space as a list of six elements. For small lists, we'll
find that the length of the freelist dominates the total space cost of the list.
Under an engineering analysis we should thus extend our implementation-
independent asymptotic analysis with both the constant factors and the newly
revealed parameters from the implementation, such as the size of the freelist and
the overhead of accessing a record.
37.3.2 1D Tree Example
A binary search tree exploits the ordering on the keys to increase the performance
of some find operations. In doing so it incurs some additional constant perfor-
mance overhead for every search and increases the storage space.
The motivation for using a tree is that we'd like to be able to find all students
with grades in some interval and do so faster than we could with the list. The tree
representation is similar to the list, although there are two child pointers in each
node. So the space bounds are slightly higher by a constant factor, but the storage
cost for n elements is still O( n ) .
The height of a balanced binary tree of n students is
log 2 n
. If there is only
one student whose grade is in the query interval, then we can find that student
using at most
comparisons. Furthermore, if that student's record is at an
internal (versus leaf) node of the tree, it may take as few as three comparison
operations to locate the student and eliminate all others.
Thus, for the case of a single student in the interval, the search takes worst-case
O(lg n ) time, which is better than the list. In practice, the constants for iterating
through a tree are similar to those for iterating through a list, so it is reasonable to
expect the tree to always outperform the list.
Depending on the size of the interval in the query and the distribution of stu-
dents, there may be more than one student in the result. This means that the time
cost of the find operation is output-sensitive, and the size of the output appears in
the bound. If there are s students that satisfy the query, the runtime of the query is
necessarily O( s +lg n ) . Since 0
log 2 n
n , in the worst case all students may be in
the output. Thus, the tightest upper bound for the general case is O( n ) time.
We could try to characterize the time cost based on the distribution of grades
and queries that we expect. However, at that point we're mixing our theory with
engineering and we're unlikely to produce a bound that informs either theory or
practice. Once we know something about the distribution and implementation
environment, we should perform back-of-the-envelope analysis with the actual
specifications or start performing some experiments.
At a practical level, we might consider the cache coherence implications of
chasing tree pointers through memory versus sequential access for a tree packed
into an array as a vector heap, or a list packed into an array. We might also look at
the complexity of the data structure and the runtime to build and update the tree
to support the fast queries.
In the case where queries and geometry are unevenly distributed, balancing the
tree might not lead to optimal performance over many queries. For example, say
that there are many students whose grades cluster around 34 but we don't expect
to query for grades less than 50 very often. In that case, we want a deep subtree
for grades less than 50 so that students with grades greater than 50 can appear near
the root. Nodes near the root can be reached more quickly and are more likely to
stay in cache.
s
 
 
Search WWH ::




Custom Search