Information Technology Reference
In-Depth Information
one is the large overlapping between these classes and the inherent conceptual
drift of the spam set [2,3]. The most used technique to detect spams is the
Bayesian analysis [4,5,6], but other machine learning techniques have been used
to detect or categorize spams, as Support Vector Machines [7], decision trees
[8,9] and case-based reasoning [3].
If we interpret the spams as pathogens, the use of the natural immune system
as inspiration to develop new methods, to detect or to categorize spam, is well
supported, as can be seen in [10,11,12,13]. Here, we propose the use of a super-
vised version of a Real-Valued Antibody Network [14]. The antibody network
will work as a classifier of new messages.
The paper is organized as follows: in Section 2 the antibody network is pre-
sented together with some previous works; in Section 3 the corpus are described
and its pre-processing methods are described in Section 4; some performance
measures are introduced in Section 5 and the results are presented in Section 6.
Analytical and concluding remarks are outlined in Section 7.
2
Applying the SRABNET to Capture Spam
De Castro et al. [15] proposed a growing artificial binary antibody repertoire
to recognize antigens, which was called AntiBody NETwork (ABNET). Boolean
weights were adopted for antigens and antibodies. Knidel et al. [16] extended
that previous work and proposed real-valued vectors to represent the weights of
the network (RABNET), for data clustering tasks.
In classification problems with labelled samples, it is important to use that
information to improve the performance of the model. Based on this idea, cite-
Knidel2006 proposed a supervised version of the RABNET called SRABNET
(Supervised Real-Valued Antibody Network), which is well suited for such clas-
sification tasks, once it uses the label of the samples during the evolution of the
system.
Inspired by ideas from neural networks and artificial immune systems, the
SRABNET assumes a population of antigens ( Ag ) to be recognized by an an-
tibody repertoire ( Ab ) modeled as a one-dimensional competitive supervised
network with real-valued weights. Being a supervised approach, the first differ-
ence from RABNET [16] is that at the beginning of the network adaptation,
while RABNET starts with only one antibody, the SRABNET will present one
antibody assigned to each class. The weights of these initial antibodies are de-
fined by the arithmetic mean taken in the space of attributes from all the data
belongingtotheclasstowhichtheantibodyisassigned.
In summary, the following features are associated with SRABNET:
- Competitive network with supervised learning;
- Constructive network structure, with growing and pruning phases governed
by an implementation of the clonal selection principle; and
- Real-valued connection weights in an Euclidean shape-space [17].
Although there are similar stages in the learning algorithms of RABNET
and SRABNET, the way they are implemented will depend upon the learning
Search WWH ::




Custom Search