Using CBR as Design Methodology for Developing Adaptable Decision Support Systems - Efficient Decision Support Systems: Practice and Challenges from Current to Future

Information Technology Reference

In-Depth Information

3.4 Anti-spam filtering

The E-mail service is a computer-based technology built as the result of transforming the old

postal delivery in order to use it over networks and Internet. Nowadays, e-mail addresses

are present on every business card close to other relevant contact info such as the postal

address or the phone number. However, for more than one decade the use of e-mail has

been bedeviled by the curse of spamming, so spam is beginning to undermine the integrity

of e-mail and even to discourage its use.

In this context, spam is a term used to designate all forms of unsolicited commercial e-mail

and can be formally defined as an electronic message satisfying the following two

conditions: ( i ) the recipient's personal identity and context are irrelevant because the

message is equally applicable to many other potential recipients and ( ii ) the recipient has not

verifiably granted deliberate, explicit, and still-revocable permission for it to be sent

(SpamHaus, 1998).

Due to some attractive characteristics of e-mail (low cost & fast delivery) it actually becomes

the main distribution channel of spam contents. Every day e-mail users receive lots of

messages containing offers to buy illegal drugs, replicas of Swiss watches, fake jobs, forged

university diplomas, etc. This situation has led to a progressive increasing of the spam

global ratio in email traffic. During September 2010, the percentage of spam deliveries

accounted for about 92 percent of all Internet e-mail traffic (MessageLabs, 2010).

In order to successfully fight against spam (i.e. ideally eliminate it), both theoretical and

applied research on spam filtering becomes fundamental. In this context, much valuable

research work has been previously carried out (Guzella & Caminhas, 2009) and some

relevant conferences have grown up in the field (CEAS, 2010). Moreover, several

commercial products have been released and distributed from the software industry to a

huge amount of final users with the goal of minimizing spam drawbacks.

With the goal of providing an effective solution we present the SpamHunting system

(Fdez-Riverola et al. 2007), an instance-based reasoning e-mail filtering model that

outperforms classical machine learning techniques and other successful lazy learner's

approaches in the domain of anti-spam filtering. The architecture of the decision support

filter is based on a tuneable enhanced instance retrieval network able to accurately

generalize e-mail representations. The reuse of similar messages is carried out by a simple

unanimous voting mechanism to determine whether the target case is spam or not.

Previous to the final response of the system, the revision stage is only performed when

the assigned class is spam whereby the system employs general knowledge in the form of

meta-rules.

In order to correctly represent incoming e-mails, a message descriptor (instance) is

generated and stored in the e-mail base of the SpamHunting system. This message

descriptor contains the sequence of features that better summarize the information

contained in the e-mail. For this purpose, we use data from two main sources: ( i )

information obtained from the header of the e-mail and ( ii ) those terms that are more

representative of the subject, body and attachments of the message. Table 4 summarizes the

structure of each instance stored in the SpamHunting e-mail base.

Figure 10 illustrates the life cycle of the IBR SpamHunting system as well as its integration

within a typical user environment. In the upper part of Figure 10, the mail user agent

(MUA) and the mail transfer agent (MTA) are in charge of dispatching the requests

generated by the user. Between these two applications, SpamHunting captures all the

incoming messages (using POP3 protocol) in order to identify, tag and filter spam.

Efficient Decision Support Systems: Practice and Challenges from Current to Future

Search WWH ::

Custom Search

Home