Spatial Data Structures - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

Under this list representation, a list of three small elements might require

almost the same amount of space as a list of six elements. For small lists, we'll

find that the length of the freelist dominates the total space cost of the list.

Under an engineering analysis we should thus extend our implementation-

independent asymptotic analysis with both the constant factors and the newly

revealed parameters from the implementation, such as the size of the freelist and

the overhead of accessing a record.

37.3.2 1D Tree Example

A binary search tree exploits the ordering on the keys to increase the performance

of some find operations. In doing so it incurs some additional constant perfor-

mance overhead for every search and increases the storage space.

The motivation for using a tree is that we'd like to be able to find all students

with grades in some interval and do so faster than we could with the list. The tree

representation is similar to the list, although there are two child pointers in each

node. So the space bounds are slightly higher by a constant factor, but the storage

cost for n elements is still O( n ) .

The height of a balanced binary tree of n students is

log 2 n

. If there is only

one student whose grade is in the query interval, then we can find that student

using at most

comparisons. Furthermore, if that student's record is at an

internal (versus leaf) node of the tree, it may take as few as three comparison

operations to locate the student and eliminate all others.

Thus, for the case of a single student in the interval, the search takes worst-case

O(lg n ) time, which is better than the list. In practice, the constants for iterating

through a tree are similar to those for iterating through a list, so it is reasonable to

expect the tree to always outperform the list.

Depending on the size of the interval in the query and the distribution of stu-

dents, there may be more than one student in the result. This means that the time

cost of the find operation is output-sensitive, and the size of the output appears in

the bound. If there are s students that satisfy the query, the runtime of the query is

necessarily O( s +lg n ) . Since 0

log 2 n

≤

n , in the worst case all students may be in

the output. Thus, the tightest upper bound for the general case is O( n ) time.

We could try to characterize the time cost based on the distribution of grades

and queries that we expect. However, at that point we're mixing our theory with

engineering and we're unlikely to produce a bound that informs either theory or

practice. Once we know something about the distribution and implementation

environment, we should perform back-of-the-envelope analysis with the actual

specifications or start performing some experiments.

At a practical level, we might consider the cache coherence implications of

chasing tree pointers through memory versus sequential access for a tree packed

into an array as a vector heap, or a list packed into an array. We might also look at

the complexity of the data structure and the runtime to build and update the tree

to support the fast queries.

In the case where queries and geometry are unevenly distributed, balancing the

tree might not lead to optimal performance over many queries. For example, say

that there are many students whose grades cluster around 34 but we don't expect

to query for grades less than 50 very often. In that case, we want a deep subtree

for grades less than 50 so that students with grades greater than 50 can appear near

the root. Nodes near the root can be reached more quickly and are more likely to

stay in cache.

s

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home