Database Reference
In-Depth Information
million. These n words are sorted by frequency, and they occupy the end of the list, with
the most frequent words at the very end. All words not among the n most frequent can be
assumed equally infrequent and ordered lexicographically. Then, the words of any docu-
ment can be ordered. If a word does not appear on the list of n frequent words, place it at
the front of the order, lexicographically. Those words in the document that do appear on
the list of most frequent words appear after the infrequent words, in the reverse order of
frequency (i.e., with the most frequent words of the documents ordered last).
EXAMPLE 8.11 Suppose our document is
'Twas brillig, and the slithy toves
“The” is the most frequent word in English, and “and” is only slightly less frequent. Let us
suppose that “twas” makes the list of frequent words, although its frequency is surely lower
than that of “the” or “and.” The other words do not make the list of frequent words.
Then the end of the list consists of “twas,” “and,” and “the,” in that order, since that is
the inverse order of frequency. The other three words are placed at the front of the list in
lexicographic order. Thus,
brillig slithy toves twas and the
is the sequence of words in the document, properly ordered.
The bids are stored in a hash-table, whose hash key is the first word of the bid, in the or-
der explained above. The record for the bid will also include information about what to do
when the bid is matched. The status is 0 and need not be stored explicitly. There is another
hash table, whose job is to contain copies of those bids that have been partially matched.
These bids have a status that is at least 1, but less than the number of words in the set. If
the status is i , then the hashkey for this hash table is the ( i + 1)st word. The arrangement of
hash tables is suggested by Fig. 8.5 . To process a document, do the following.
Figure 8.5 Managing large numbers of bids and large numbers of documents
Search WWH ::




Custom Search