Database Reference
In-Depth Information
A different line of research has proposed a data analysis approach to discriminate
legitimate from abusive clicks. Ref. [16] focuses on the problem of finding colluding
publishers. The proposed system analyzes the IP addresses generating the click traf-
fic for each publisher to identify groups of publishers receiving clicks from roughly
the same IPs. Ref. [18] addresses the scenario of a single publisher generating fraud-
ulent traffic from several IPs. The authors propose a system to automatically detect
highly correlated pairs of publisher and IP address.
14.7 CONCLUSION
This chapter describes (a) a data-driven approach that accurately estimates the sizes
of IP addresses [19]; (b) the details of novel findings on employing different size
estimation models, and their error analysis; and (c) the machine-generated traffic
filter that uses the IP size information effectively [22]. The filter operates at the click
granularity to combat attacks that use homogeneous infrastructures, and at the pub-
lisher granularity to combat sophisticated attacks that spread malicious traffic across
a wide range of IP sizes.
The techniques discussed here (i) do not require any identification or authentica-
tion of the users generating the clicks; (ii) are fully automated, have low complexity
(it scales linearly in the amount of data to be processed), are easy to parallelize, and
are suitable for large-scale detection; (iii) are general and can be applied to a wide
spectrum of fraud detection problems (e.g., distribution of UA size sending emails,
or the time series distribution of plus ones on the Google social network); (iv) are
robust to DHCP reassignment (clicks generated from a specific host have the same
size regardless the specific IP address assigned, which is particularly useful in prac-
tice, since a large fraction of IPs are dynamically reassigned every 1 to 3 days [27]),
and are hard to evade. In fact, even if the attacker knows the legitimate distribution
of IP sizes for all publishers in their group, and the exact mechanisms used to esti-
mate the IP size, they would still need to generate clicks according to the legitimate
IP size distribution, which is not controlled or even accessible to them.
The main limitation is that the filter requires a statistically significant number of
clicks per publisher. If this is not the case, approaches that identify colluding pub-
lishers, e.g., [16], would catch these attacks. The techniques discussed in this chapter
are currently used as part of a larger detection system deployed at Google in conjunc-
tion with complementary techniques.
REFERENCES
1. S. Bellovin. A Technique for Counting NATted hosts. In SIGCOMM IMW , pp. 267-272,
2002.
2. P. Bloomfield. Fourier Analysis of Time Series: An Introduction . Wiley-IEEE, 2004.
3. I. A. Bureau. Internet Ad Revenues Hit $31 Billion in 2011, Historic High Up 22% Over
2010 Record-Breaking Numbers. http://www.iab.net/about the iab/recent press releases/
press release archive/press release/pr-041812, April 18, 2012.
4. M. Casado and M. Freedman. Peering Through the shroud: The effect of edge opacity on
IP-based client identification. In NSDI , pp. 173-186, 2007.
Search WWH ::




Custom Search