Databases Reference
In-Depth Information
words in the document that do appear on the list of most frequent words appear
after the infrequent words, in the reverse order of frequency (i.e., with the most
frequent words of the documents ordered last).
Example 8.11 : Suppose our document is
'Twas brillig, and the slithy toves
“The” is the most frequent word in English, and “and” is only slightly less
frequent. Let us suppose that “twas” makes the list of frequent words, although
its frequency is surely lower than that of “the” or “and.” The other words do
not make the list of frequent words.
Then the end of the list consists of “twas,” “and,” and “the,” in that order,
since that is the inverse order of frequency. The other three words are placed
at the front of the list in lexicographic order. Thus,
brillig slithy toves twas and the
is the sequence of words in the document, properly ordered.
2
The bids are stored in a hash-table, whose hash key is the first word of
the bid, in the order explained above. The record for the bid will also include
information about what to do when the bid is matched. The status is 0 and
need not be stored explicitly. There is another hash table, whose job is to
contain copies of those bids that have been partially matched. These bids have
a status that is at least 1, but less than the number of words in the set. If
the status is i, then the hash-key for this hash table is the (i + 1)st word. The
arrangement of hash tables is suggested by Fig. 8.5. To process a document,
do the following.
1. Sort the words of the document in the order discussed above. Eliminate
duplicate words.
2. For each word w, in the sorted order:
(i) Using w as the hash-key for the table of partially matched bids, find
those bids having w as key.
(ii) For each such bid b, if w is the last word of b, move b to the table of
matched bids.
(iii) If w is not the last word of b, add 1 to b's status, and rehash b using
the word whose position is one more than the new status, as the
hash-key.
(iv) Using w as the hash key for the table of all bids, find those bids for
which w is their first word in the sorted order.
(v) For each such bid b, if there is only one word on its list, copy it to
the table of matched bids.
Search WWH ::




Custom Search