Database Reference
In-Depth Information
(1) Sort the words of the document in the order discussed above. Eliminate duplicate
words.
(2) For each word w , in the sorted order:
(i) Using w as the hash-key for the table of partially matched bids, find those bids
having w as key.
(ii) For each such bid b , if w is the last word of b , move b to the table of matched
bids.
(iii) If w is not the last word of b , add 1 to b s status, and rehash b using the word
whose position is one more than the new status, as the hash-key.
(iv) Using w as the hash key for the table of all bids, find those bids for which w is
their first word in the sorted order.
(v) For each such bid b , if there is only one word on its list, copy it to the table of
matched bids.
(vi) If b consists of more than one word, add it, with status 1, to the table of partially
matched bids, using the second word of b as the hash-key.
(3) Produce the list of matched bids as the output.
The benefit of the rarest-first order should now be visible. A bid is only copied to the
second hash table if its rarest word appears in the document. In comparison, if lexicograph-
ic order was used, more bids would be copied to the second hash table. By minimizing the
size of that table, we not only reduce the amount of work in steps 2(i)-2(iii), but we make
it more likely that this entire table can be kept in main memory.
8.6 Summary of Chapter 8
Targeted Advertising : The big advantage that Web-based advertising has over advertising in conventional media
such as newspapers is that Web advertising can be selected according to the interests of each individual user. This
advantage has enabled many Web services to be supported entirely by advertising revenue.
On- and Off-Line Algorithms : Conventional algorithms that are allowed to see all their data before producing an an-
swer are called off-line. An online algorithm is required to make a response to each element in a stream immediately,
with knowledge of only the past, not the future elements in the stream.
Greedy Algorithms : Many on-line algorithms are greedy, in the sense that they select their action at every step by
minimizing some objective function.
Competitive Ratio : We can measure the quality of an on-line algorithm by minimizing, over all possible inputs, the
value of the result of the on-line algorithm compared with the value of the result of the best possible off-line al-
gorithm.
Bipartite Matching : This problem involves two sets of nodes and a set of edges between members of the two sets.
The goal is to find a maximal matching - as large a set of edges as possible that includes no node more than once.
On-Line Solution to the Matching Problem : One greedy algorithm for finding a match in a bipartite graph (or any
graph, for that matter) is to order the edges in some way, and for each edge in turn, add it to the match if neither of
Search WWH ::




Custom Search