Efficient Intrusion Detection with KNN Classification and DS Theory - Proceedings of All India Seminar on Biomedical Engineering 2012 (AISOBE-2012)

Biomedical Engineering Reference

In-Depth Information

To further understand the Bayesian approach, especially with regard to repre-

sentation of ignorance, consider the following example, similar to that in [ 5 ]. Let

there be a proposition that, ''I live in Kings Road, Cardiff''.

How could one construct P(a), a Bayesian belief in a? First, we must choose a

frame of discernment, denoted by H and a subset A of H representing the propo-

sition a; then would I need to use the Principle of Insufficient Reason to arrive at a

Bayesian belief. The problem is there are a number of possible frames of discern-

ment H that we could choose, depending effectively on how many Cardiff roads can

be enumerated. If only two such streams are identifiable, then H = {x 1 ,x 2 },

A = {x 1 }. The ''Principle of Insufficient Reason'' then gives P(a), to be 0.5, through

evenly allocating subjective probabilities over the frame of discernment. If it is

estimated that there are about 1,000 roads in Cardiff, then H = {x 1 ,x 2, ……. x 1000 }

with again A = {x i } and other x i 's representing the other roads. In this case the

''theory of insufficient reason'' gives P(A) = 0.001. Either of these frames may be

reasonable, but the probability assigned to A is crucially dependent upon the frame

chosen. Hence, once Bayesian belief is a function not only of the information given

and one's background knowledge, but also of sometimes the arbitrary choice of

frame of discernment. To put the point in another way, we need to distinguish

between uncertainty and ignorance. Similar arguments hold where we are discussing

not probabilities per se but weights which measure subjective assessments of rela-

tive importance. This issue arises in decision support models such as the Analytic

Hierarchy Process (AHP), which requires certain weights on a given level of deci-

sion tree to unity, see [ 22 ].

KDD Data Set 99

In 1998, DARPA in concert with Lincoln Laboratory at MIT launched the DARPA

1998 data set for evaluating IDS [ 23 ]. The DARPA 1998 data set contains 7 weeks

of training and also 2 weeks of testing data. In total, there are 38 attacks in training

data as well as in testing data. The refined version of DARPA data set which

contains only network data (i.e. Tcpdump data) is termed as KDD data set. The

Third International Knowledge Discovery and Data Mining Tools Competition

were held in colligation with KDD-99, the Fifth International Conference on

Knowledge Discovery and Data Mining. KDD data set is a data set employed for

this Third International Knowledge Discovery and Data Mining Tools

Competition. KDD training data set consists of relatively 4,900,000 single

connection vectors where each single connection vectors consist of 41 features and

is marked as either normal or an attack, with exactly one particular attack type

[ 23 ]. These features had all continuous and symbolic forms with extensively

varying ranges falling into four categories:

• In a connection, the first category consists of the intrinsic features which

comprises

the

fundamental

features

of

each

individual

TCP

connections.

Proceedings of All India Seminar on Biomedical Engineering 2012 (AISOBE-2012)

Search WWH ::

Custom Search

Home