Large-Scale Network Traffic Analysis for Estimating the Size of IP Addresses and Detecting Traffic Anomalies - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

A different line of research has proposed a data analysis approach to discriminate

legitimate from abusive clicks. Ref. [16] focuses on the problem of finding colluding

publishers. The proposed system analyzes the IP addresses generating the click traf-

fic for each publisher to identify groups of publishers receiving clicks from roughly

the same IPs. Ref. [18] addresses the scenario of a single publisher generating fraud-

ulent traffic from several IPs. The authors propose a system to automatically detect

highly correlated pairs of publisher and IP address.

14.7 CONCLUSION

This chapter describes (a) a data-driven approach that accurately estimates the sizes

of IP addresses [19]; (b) the details of novel findings on employing different size

estimation models, and their error analysis; and (c) the machine-generated traffic

filter that uses the IP size information effectively [22]. The filter operates at the click

granularity to combat attacks that use homogeneous infrastructures, and at the pub-

lisher granularity to combat sophisticated attacks that spread malicious traffic across

a wide range of IP sizes.

The techniques discussed here (i) do not require any identification or authentica-

tion of the users generating the clicks; (ii) are fully automated, have low complexity

(it scales linearly in the amount of data to be processed), are easy to parallelize, and

are suitable for large-scale detection; (iii) are general and can be applied to a wide

spectrum of fraud detection problems (e.g., distribution of UA size sending emails,

or the time series distribution of plus ones on the Google social network); (iv) are

robust to DHCP reassignment (clicks generated from a specific host have the same

size regardless the specific IP address assigned, which is particularly useful in prac-

tice, since a large fraction of IPs are dynamically reassigned every 1 to 3 days [27]),

and are hard to evade. In fact, even if the attacker knows the legitimate distribution

of IP sizes for all publishers in their group, and the exact mechanisms used to esti-

mate the IP size, they would still need to generate clicks according to the legitimate

IP size distribution, which is not controlled or even accessible to them.

The main limitation is that the filter requires a statistically significant number of

clicks per publisher. If this is not the case, approaches that identify colluding pub-

lishers, e.g., [16], would catch these attacks. The techniques discussed in this chapter

are currently used as part of a larger detection system deployed at Google in conjunc-

tion with complementary techniques.

REFERENCES

1. S. Bellovin. A Technique for Counting NATted hosts. In SIGCOMM IMW , pp. 267-272,

2002.

2. P. Bloomfield. Fourier Analysis of Time Series: An Introduction . Wiley-IEEE, 2004.

3. I. A. Bureau. Internet Ad Revenues Hit $31 Billion in 2011, Historic High Up 22% Over

2010 Record-Breaking Numbers. http://www.iab.net/about the iab/recent press releases/

press release archive/press release/pr-041812, April 18, 2012.

4. M. Casado and M. Freedman. Peering Through the shroud: The effect of edge opacity on

IP-based client identification. In NSDI , pp. 173-186, 2007.

Search WWH ::

Custom Search

Home