Database Reference
In-Depth Information
Real-Time Identification of Fraudulent
Websites
One way a fraudulent website generates traffic is by employing a
so-called “bot network” to generate large amounts of essentially fake
traffic for their website. This type of traffic can actually be quite difficult
to detect because most of the devices in the network are legitimate
computers on the Internet that have the misfortune of being infected
with malware. In fact, under normal circumstances the legitimate user
of the device is simply using the device in a normal and nonfraudulent
way, and it is only when the network is activated that the traffic can be
identified as fraudulent.
To combat this problem, you want to blacklist certain websites in
real-time when they exhibit behavior consistent with the application of
a botnet to their website to generate revenue.
To do this you first generate a base profile of botnet traffic by taking the
patterns of IP addresses of visiting sites that you have previously
identified as being fraudulent through manual investigation or through
one of the industry sources devoted to maintaining this data.
Next you need a distance metric that allows you to compare two IP
address profiles. One such metric could be the Jaccard Index, which is
defined as the size of the intersection of the two sets being compared
divided by the size of the union of the two sets being compared.
If you maintain your website profiles as Bloom Filters along with a
Bloom Filter representing your botnet profile then you can estimate the
cardinality of the union of the two filters quite easily using the equation
just derived for cardinality applied to the union of the two sets. In fact,
since you only need the count of bits set to 1, you do not actually need
to compute the Bloom Filter of the union—simply count the bits that
are set in one or the other.
To compute the cardinality of the intersection, you can take advantage
of a property of set cardinality: |A B| + |AB| = |A| + |B|. With a little
rearrangement, the cardinality of the intersection is simply
Search WWH ::




Custom Search