Database Reference
In-Depth Information
Filtering Duplicate Events in Online
Advertising
In online advertising, it is a fairly common practice to assign a unique
identifier to each viewing of an ad. This unique identifier is used to
identify situations where an ad view or click may register more than
once in the system. The reasons that this happens range from the
mundane case where a user habitually double-clicks links to malicious
“bots” that attempt to generate revenue by placing ads on fraudulent
sites.
The advertising industry is generally conservative when it comes to
validated page views (also known as impressions) and clicks, so it is
generally preferable to err on the side of caution and throw away a
small percentage of valid events if it means cleaning out more of the
invalid events. The industry has also taken a general turn toward
so-called “programmatic buying,” where ads are traded across the
Internet in exchanges similar to a real-world commodity exchange. This
happens via a bidding mechanism that typically completes in less than
100 milliseconds.
With a large number of unique identifiers and the need to maintain
near real-time counts of impressions and clicks to allow for optimal
bidding, filtering these duplicate events is a perfect application of data
structures like the Bloom Filter.
An initial approach might be to maintain a Bloom Filter for views and
another filter for clicks. However, views happen 100 percent of the
time, whereas clicks occur perhaps 1 percent to 5 percent of the time.
For the purposes of accounting, it is still desirable to filter impressions,
but this can likely be done offline. Clicks, on the other hand, are used to
determine bids, so you can maintain a single Bloom Filter to filter out
duplicate clicks that would artificially inflate the apparent quality of an
ad placement.
Search WWH ::




Custom Search