Collective Classification for Spam Filtering - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

Fig. 2. Recall of the evaluation of collective algorithms for spam filtering with different

sizes for the X set of known instances. Solid lines correspond to Ling Spam and dashed

lines correspond to SpamAssassin.

increases (from 0.68 with 10% to 0.89 with 90%), but remains constant with

SpamAssassin (between 0.93 and 0.94). Collective Forest was the best collective

algorithm when evaluating the precision achieving between 0.99 and 1.00 for

Ling Spam and no less than 0.93 for SpamAssassin. Finally, Collective Woods

and Random Woods experience some improvements when increasing the number

of known instances when testing with both datasets.

Fig. 2 shows the recall of the different algorithms. Again, Collective KNN

shows better results, although not good enough, when the number of known

instances increases: from a 0.32 with 10% to 0.75 with 90% for Ling Spam and

from 0.13 with 10% to 0.83 with 90%. Collective Forest presents a poor 0.78 for

10% with Ling Spam but behaves better with the rest of configurations in both

datasets: a minimum of 90% and a maximum of 0.97. Finally, Collective Woods

Fig. 3. Area under de ROC curve (AUC) evaluation of collective algorithms for spam

filtering with different sizes for the X set of known instances. Solid lines correspond to

Ling Spam and dashed lines correspond to SpamAssassin.

Search WWH ::

Custom Search

Home