Large-Scale Network Traffic Analysis for Estimating the Size of IP Addresses and Detecting Traffic Anomalies - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

(b)

(a)

36

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0

32

28

24

20

16

12

8

4

0

1.0

0.8

0.6

0.4

0.2

0.0

-2

0

2468 0121416

Click size

-2

0

246810

12

14

16

Click size

(c)

(d)

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

2.5

0.0

36

32

28

24

20

16

12

8

4

0

1.0

0.8

0.6

0.4

0.2

0.0

-2

0246810

12

14

16

-2

02468 0121416

Click size

FIGURE 14.9 (a)-(d) The IP size distribution of two groups of publishers, named A and B

for anonymity purpose, which include hundreds of different publishers. Each point represents

the percentage of clicks, of a given size, received by a publisher. For each group of publishers,

two figures are plotted. In (a) and (c), the color indicates the scaled fraud score. The volume

is proportional to the number of clicks associated with the data point. In (b) and (d), the color

indicates the scaled quality score.

quality score. The spikes corresponding to high fraud score also have very low, or

zero, quality score. This confirms that the clicks identified by the anomaly detection

system are indeed abusive clicks.

Figure 14.9c and d illustrates a sample group where the IP size distribution filter

detects machine-generated traffic that would have been undetected otherwise. For

instance, Figure 14.9c shows the case of a publisher that has about 70% of its clicks

in bucket 6. This spike in distribution is particularly suspicious since all other pub-

lishers in the same group have 15% or less click of this size range. The quality score

associated with this point confirms this intuition. In fact, the large number of clicks

(large circle in Figure 14.9d) was associated with very low quality score.

14.5.4.6 Analysis of a Single Bucket

In Figure 14.10, the focus is on bucket 0 of Figure 14.9a, as this is the bucket with

the largest number of data points. The variation of number of filtered clicks, the fraud

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home