Database Reference
In-Depth Information
The Triangular-Matrix Method
Even after coding items as integers, we still have the problem that we must count a pair { i,
j } in only one place. For example, we could order the pair so that i < j , and only use the
entry a [ i, j ] in a two-dimensional array a . That strategy would make half the array useless.
A more space-efficient way is to use a one-dimensional triangular array . We store in a [ k ]
the count for the pair { i, j }, with 1 ≤ i < j n , where
The result of this layout is that the pairs are stored in lexicographic order, that is first {1 ,
2} , {1 , 3} , . . . , {1 , n }, then {2 , 3} , {2 , 4} , . . . , {2 , n }, and so on, down to { n − 2 , n − 1} ,
{ n − 2 , n }, and finally { n − 1 , n }.
The Triples Method
There is another approach to storing counts that may be more appropriate, depending on
the fraction of the possible pairs of items that actually appear in some basket. We can store
counts as triples [ i, j, c ], meaning that the count of pair { i, j }, with i < j , is c . A data struc-
ture, such as a hash table with i and j as the search key, is used so we can tell if there is
a triple for a given i and j and, if so, to find it quickly. We call this approach the triples
method of storing counts.
Unlike the triangular matrix, the triples method does not require us to store anything if
the count for a pair is 0. On the other hand, the triples method requires us to store three in-
tegers, rather than one, for every pair that does appear in some basket. In addition, there is
the space needed for the hash table or other data structure used to support efficient retriev-
al. The conclusion is that the triangular matrix will be better if at least 1/3 of the possible
pairs actually appear in some basket, while if significantly fewer than 1/3 of the possible
pairs occur, we should consider using the triples method.
EXAMPLE 6.6 Suppose there are 100,000 items, and 10,000,000 baskets of 10 items each.
Then the triangular-matrix method requires (approximately) integer counts. 1 On the
other hand, the total number of pairs among all the baskets is Even in the extreme
case that every pair of items appeared only once, there could be only 4 . 5 × 10 8 pairs with
nonzero counts. If we used the triples method to store counts, we would need only three
times that number of integers, or 1 . 35×10 9 integers. Thus, in this case the triples method
will surely take much less space than the triangular matrix.
However, even if there were ten or a hundred times as many baskets, it would be normal
for there to be a sufficiently uneven distribution of items that we might still be better off
Search WWH ::




Custom Search