Database Reference
In-Depth Information
8.5.1
Matching Bids and Search Queries
As we have described the adwords problem, and as it normally appears in practice, advert-
isers bid on sets of words. If a search query occurs having exactly that set of words in some
order, then the bid is said to match the query, and it becomes a candidate for selection. We
can avoid having to deal with word order by storing all sets of words representing a bid in
lexicographic (alphabetic) order. The list of words in sorted order forms the hash-key for
the bid, and these bids may be stored in a hash table used as an index, as discussed in Sec-
tion 1.3.2 .
Search queries also have their words sorted prior to lookup. When we hash the sorted
list, we find in the hash table all the bids for exactly that set of words. They can be retrieved
quickly, since we have only to look at the contents of that bucket.
Moreover, there is a good chance that we can keep the entire hash table in main memory.
If there are a million advertisers, each bidding on 100 queries, and the record of the bid
requires 100 bytes, then we require ten gigabytes of main memory, which is well within
the limits of what is feasible for a single machine. If more space is required, we can split
the buckets of the hash table among as many machines as we need. Search queries can be
hashed and sent to the appropriate machine.
In practice, search queries may be arriving far too rapidly for a single machine, or group
of machines that collaborate on a single query at a time, to handle them all. In that case,
the stream of queries is split into as many pieces as necessary, and each piece is handled
by one group of machines. In fact, answering the search query, independent of ads, will re-
quire a group of machines working in parallel anyway, in order that the entire processing
of a query can be done in main memory.
8.5.2
More Complex Matching Problems
However, the potential for matching bids to objects is not limited to the case where the ob-
jects are search queries and the match criterion is same-set-of-words. For example, Google
also matches adwords bids to emails. There, the match criterion is not based on the equal-
ity of sets. Rather, a bid on a set of words S matches an email if all the words in S appear
anywhere in the email.
This matching problem is much harder. We can still maintain a hash-table index for the
bids, but the number of subsets of words in a hundred-word email is much too large to look
up all the sets, or even all the small sets of (say) three or fewer words. There are a number
of other potential applications of this sort of matching that, at the time of this writing, are
not implemented but could be. They all involve standing queries - queries that users post to
Search WWH ::




Custom Search