Information Technology Reference
In-Depth Information
Fig. 2. Recall of the evaluation of collective algorithms for spam filtering with different
sizes for the X set of known instances. Solid lines correspond to Ling Spam and dashed
lines correspond to SpamAssassin.
increases (from 0.68 with 10% to 0.89 with 90%), but remains constant with
SpamAssassin (between 0.93 and 0.94). Collective Forest was the best collective
algorithm when evaluating the precision achieving between 0.99 and 1.00 for
Ling Spam and no less than 0.93 for SpamAssassin. Finally, Collective Woods
and Random Woods experience some improvements when increasing the number
of known instances when testing with both datasets.
Fig. 2 shows the recall of the different algorithms. Again, Collective KNN
shows better results, although not good enough, when the number of known
instances increases: from a 0.32 with 10% to 0.75 with 90% for Ling Spam and
from 0.13 with 10% to 0.83 with 90%. Collective Forest presents a poor 0.78 for
10% with Ling Spam but behaves better with the rest of configurations in both
datasets: a minimum of 90% and a maximum of 0.97. Finally, Collective Woods
Fig. 3. Area under de ROC curve (AUC) evaluation of collective algorithms for spam
filtering with different sizes for the X set of known instances. Solid lines correspond to
Ling Spam and dashed lines correspond to SpamAssassin.
 
Search WWH ::




Custom Search