An Immunological Filter for Spam - Artificial Immune Systems

Information Technology Reference

In-Depth Information

one is the large overlapping between these classes and the inherent conceptual

drift of the spam set [2,3]. The most used technique to detect spams is the

Bayesian analysis [4,5,6], but other machine learning techniques have been used

to detect or categorize spams, as Support Vector Machines [7], decision trees

[8,9] and case-based reasoning [3].

If we interpret the spams as pathogens, the use of the natural immune system

as inspiration to develop new methods, to detect or to categorize spam, is well

supported, as can be seen in [10,11,12,13]. Here, we propose the use of a super-

vised version of a Real-Valued Antibody Network [14]. The antibody network

will work as a classifier of new messages.

The paper is organized as follows: in Section 2 the antibody network is pre-

sented together with some previous works; in Section 3 the corpus are described

and its pre-processing methods are described in Section 4; some performance

measures are introduced in Section 5 and the results are presented in Section 6.

Analytical and concluding remarks are outlined in Section 7.

2

Applying the SRABNET to Capture Spam

De Castro et al. [15] proposed a growing artificial binary antibody repertoire

to recognize antigens, which was called AntiBody NETwork (ABNET). Boolean

weights were adopted for antigens and antibodies. Knidel et al. [16] extended

that previous work and proposed real-valued vectors to represent the weights of

the network (RABNET), for data clustering tasks.

In classification problems with labelled samples, it is important to use that

information to improve the performance of the model. Based on this idea, cite-

Knidel2006 proposed a supervised version of the RABNET called SRABNET

(Supervised Real-Valued Antibody Network), which is well suited for such clas-

sification tasks, once it uses the label of the samples during the evolution of the

system.

Inspired by ideas from neural networks and artificial immune systems, the

SRABNET assumes a population of antigens ( Ag ) to be recognized by an an-

tibody repertoire ( Ab ) modeled as a one-dimensional competitive supervised

network with real-valued weights. Being a supervised approach, the first differ-

ence from RABNET [16] is that at the beginning of the network adaptation,

while RABNET starts with only one antibody, the SRABNET will present one

antibody assigned to each class. The weights of these initial antibodies are de-

fined by the arithmetic mean taken in the space of attributes from all the data

belongingtotheclasstowhichtheantibodyisassigned.

In summary, the following features are associated with SRABNET:

- Competitive network with supervised learning;

- Constructive network structure, with growing and pruning phases governed

by an implementation of the clonal selection principle; and

- Real-valued connection weights in an Euclidean shape-space [17].

Although there are similar stages in the learning algorithms of RABNET

and SRABNET, the way they are implemented will depend upon the learning

Artificial Immune Systems

Search WWH ::

Custom Search

Home