An Immunological Filter for Spam - Artificial Immune Systems

Information Technology Reference

In-Depth Information

An Immunological Filter for Spam

George B. Bezerra 1 , Tiago V. Barra 1 , Hamilton M. Ferreira 1 ,

Helder Knidel 1 , Leandro Nunes de Castro 2 , and Fernando J. Von Zuben 1

1 Laboratory of Bioinformatics and Bio-Inspired Computing (LBIC)

Department of Computer Engineering and Industrial Automation

University of Campinas, Unicamp, CP: 6101, 13083-970, Campinas/SP, Brazil

2 Catholic University of Santos, UniSantos, 11070-906, Santos/SP, Brazil

Abstract. Spam messages are continually filling email boxes of practi-

cally every Web user. To deal with this growing problem, the develop-

ment of high-performance filters to block those unsolicited messages is

strongly required. An Antibody Network, more precisely SRABNET (Su-

pervised Real-Valued Antibody Network), is proposed as an alternative

filter to detect spam. The model of the antibody network is generated

automatically from the training dataset and evaluated on unseen mes-

sages. We validate this approach using a public corpus, called PU1, which

has a large collection of encrypted personal e-mail messages containing

legitimate messages and spam. Finally, we compared the performance

with the well known naıve Bayes filter using some performances indexes

that will be presented.

1

Introduction

A pathogen is a specific causative agent (as a bacterium or virus) of disease. In

the same way a junk email, also commonly called spam and defined typically

as unsolicited and undesired electronic messages, can be seen as some sort of

disease to a personal computer. It tends to require a high percentage of memory

and network packages to store and transmit spam.

Resource allocation apart, spam forces undesired content into our mailboxes,

impairs our ability to communicate freely, and costs Internet users billions of

dollars annually. According to SpamCon foundation, the U.S. businesses lost

about US$4 billion 1 in productivity in 2004 because of spam, and those losses

can be even higher without an intervening technology or policy to curb unwanted

messages. Some solutions have been applied to avoid spam like legislation pro-

hibiting the sending of spam and blacklists (lists containing addresses of known

spam senders). Nevertheless, these methods are usually not very effective, once

the spam senders have, in the majority of the cases, “shell addresses''(i.e. ad-

dresses used once and then discarded), they can change their addresses regularly

to avoid being blacklisted [1].

The problem of detecting spam messages is popular and can be interpreted as

a binary classification task. However, what turns this classification task a hard

1 In SpamCon foundation, http://spamcon.org/ . Accessed in 05/01/2006.

Search WWH ::

Custom Search

Home