Advertising on the Web - Mining of Massive Datasets

Databases Reference

In-Depth Information

3. She may spend a lot of time on the Yahoo! golf page.

4. She may issue search queries with golf-related terms frequently.

5. She may bookmark the Web sites of one or more golf courses.

Each of these methods, and many others like these, raise enormous privacy

issues. It is not the purpose of this topic to try to resolve those issues, which

in practice probably have no solution that will satisfy all concerns. On the

one hand, people like the free services that have recently become advertising-

supported, and these services depend on advertising being much more effective

than conventional ads. There is a general agreement that, if there must be ads,

it is better to see things you might actually use than to have what pages you

view cluttered with irrelevancies. On the other hand, there is great potential

for misuse if the information leaves the realm of the machines that execute

advertising algorithms and get into the hands of real people.

8.2

On-Line Algorithms

Before addressing the question of matching advertisements to search queries, we

shall digress slightly by examining the general class to which such algorithms

belong. This class is referred to as “on-line,” and they generally involve an ap-

proach called “greedy.” We also give, in the next section, a preliminary example

of an on-line greedy algorithm for a simpler problem: maximal matching.

8.2.1

On-Line and Off-Line Algorithms

Typical algorithms work as follows. All the data needed by the algorithm is

presented initially. The algorithm can access the data in any order. At the end,

the algorithm produces its answer. Such an algorithm is called off-line.

However, there are times when we cannot see all the data before our al-

gorithm must make some decisions. Chapter 4 covered stream mining, where

we could store only a limited amount of the stream, and had to answer queries

about the entire stream when called upon to do so. There is an extreme form of

stream processing, where we must respond with an output after each stream ele-

ment arrives. We thus must decide about each stream element knowing nothing

at all of the future. Algorithms of this class are called on-line algorithms. 1

As the case in point, selecting ads to show with search queries would be

relatively simple if we could do it off-line. We would see a month's worth of

search queries, and look at the bids advertisers made on search terms, as well

as their advertising budgets for the month, and we could then assign ads to

1 Unfortunately, we are faced with another case of dual meanings, like the coincidence

involving the term “cluster” that we noted in Section 7.6.6, where we needed to interpret

properly phrases such as “algorithms for computing clusters on computer clusters.” Here, the

term “on-line” refers to the nature of the algorithm, and should not be confused with “on-line”

meaning “on the Internet” in phrases such as “on-line algorithms for on-line advertising.”

Search WWH ::

Custom Search

Home