Information Technology Reference
In-Depth Information
the application frame is structured into six panels. First, on the top of the left column, we can
find all the relevant terms that the EIRN network uses for indexing existing instances. Under
this panel, a section is located that summarizes some statistics referring to the selected terms
including: ( i ) probability of finding the term in the case base ( ii ) frequency of finding the term
in spam e-mails and ( iii ) probability of finding the term in legitimate messages.
At the top of the right column, we can find all the instances stored in the system memory.
Under this panel, an instance statistics panel can be found where all the relevant terms
belonging to the selected messages as well as their frequencies are showed.
At the top of the central column, we have placed a plot representing the relevant terms
indexed in our EIRN model. The plot panel has been developed with built-in clipboard copy
support, save images capability and drag and drop compatibility for use with any image
application or word processor. Both selected terms belonging to the left panel or part of a
message from the right panel are always highlighted in our graphic representation.
The EIRN viewer module represents each term as a two-dimension point on the plot. The
coordinates of each point are computed according to the function selected in the combo boxes
placed under the plot. The following measurements are available for each coordinate: ( i ) the
probability of finding the term t in spam e-mails stored in the system memory, p ( t | s , K ), ( ii )
the logarithmic form of the previous value, -log 2 ( p ( t | s , K )) ( iii ) the probability of finding the
term t in legitimate messages, p ( t | l , K ), ( iv ) the logarithmic form of the previous value, -
log 2 ( p ( t | l , K )), and ( v ) the probability of finding the term t in the system memory, p ( t | K ).
The results obtained confirm the idea that instance-based reasoning systems can offer a
number of advantages in the spam filtering domain. Spam is a disjoint concept and IBR
classification works well in this domain. In addition IBR systems can learn over time simply
by updating their memory with new instances of spam or legitimate e-mail. Moreover, it
provides seamless learning capabilities without the need for a separate learning process and
facilitates extending the learning process over different levels of learning. The code can be
freely obtained at http://www.spamhunting.org/.
4. Conclusion
In this chapter we have presented the CBR paradigm as an appropriate methodology for
implementing successful decision support systems together with their adequate application to
several domains. The adoption of CBR methodology for constructing decision support systems
has several remarkable benefits, allowing us to obtain a more general knowledge of the system
and to gain a deeper insight into the logical structure of the problem and its solution. Main
properties of these systems include ( i ) their ability to focus on the problem's essential features,
( ii ) the possibility of solving problems in domains that are only partially understood, ( iii ) the
competence for providing solutions when no algorithmic method is available and ( iv ) the
potential to interpret and manage open-ended and ill-defined concepts.
The main advantages of case-based reasoning paradigm over other alternative approaches
for the implementation of decision support systems are related with the fulfilment of the
following characteristics: ( i ) reduce the knowledge acquisition effort (cases are independent
from each other), ( ii ) require less maintenance effort (partially automatically by
adding/deleting cases), ( iii ) improve problem solving performance through reuse, ( iv )
makes use of existing data (e.g. in databases), ( v ) improve over time and adapt to changes in
the environment and ( vi ) present high user acceptance (domain experts and novices
understand cases quite easy).
Search WWH ::




Custom Search