Information Technology Reference
In-Depth Information
An Immunological Filter for Spam
George B. Bezerra 1 , Tiago V. Barra 1 , Hamilton M. Ferreira 1 ,
Helder Knidel 1 , Leandro Nunes de Castro 2 , and Fernando J. Von Zuben 1
1 Laboratory of Bioinformatics and Bio-Inspired Computing (LBIC)
Department of Computer Engineering and Industrial Automation
University of Campinas, Unicamp, CP: 6101, 13083-970, Campinas/SP, Brazil
2 Catholic University of Santos, UniSantos, 11070-906, Santos/SP, Brazil
Abstract. Spam messages are continually filling email boxes of practi-
cally every Web user. To deal with this growing problem, the develop-
ment of high-performance filters to block those unsolicited messages is
strongly required. An Antibody Network, more precisely SRABNET (Su-
pervised Real-Valued Antibody Network), is proposed as an alternative
filter to detect spam. The model of the antibody network is generated
automatically from the training dataset and evaluated on unseen mes-
sages. We validate this approach using a public corpus, called PU1, which
has a large collection of encrypted personal e-mail messages containing
legitimate messages and spam. Finally, we compared the performance
with the well known naıve Bayes filter using some performances indexes
that will be presented.
1
Introduction
A pathogen is a specific causative agent (as a bacterium or virus) of disease. In
the same way a junk email, also commonly called spam and defined typically
as unsolicited and undesired electronic messages, can be seen as some sort of
disease to a personal computer. It tends to require a high percentage of memory
and network packages to store and transmit spam.
Resource allocation apart, spam forces undesired content into our mailboxes,
impairs our ability to communicate freely, and costs Internet users billions of
dollars annually. According to SpamCon foundation, the U.S. businesses lost
about US$4 billion 1 in productivity in 2004 because of spam, and those losses
can be even higher without an intervening technology or policy to curb unwanted
messages. Some solutions have been applied to avoid spam like legislation pro-
hibiting the sending of spam and blacklists (lists containing addresses of known
spam senders). Nevertheless, these methods are usually not very effective, once
the spam senders have, in the majority of the cases, “shell addresses''(i.e. ad-
dresses used once and then discarded), they can change their addresses regularly
to avoid being blacklisted [1].
The problem of detecting spam messages is popular and can be interpreted as
a binary classification task. However, what turns this classification task a hard
1 In SpamCon foundation, http://spamcon.org/ . Accessed in 05/01/2006.
 
Search WWH ::




Custom Search