Digging into IP Flow Records with a Visual Kernel Method - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

3 Overview of LLGC

Learning with Local and Global Consistency (LLGC) [11] is a semi-supervised

algorithm that provides smooth classification with respect to the intrinsic struc-

ture revealed by known labelled and unlabelled points. The method is a simple

iteration algorithm that constructs a smooth function coherent to the next as-

sumptions: (i) nearby points are likely to have the same label and (ii) points on

the same structure are likely to have the same label [11].

Formally, the algorithm is stated as follows. Let

{

x 1 ,x 2 , ..., x − 1 ,x }⊂

m be the set composed of the data instances and

the set of labels

(in our case, this set comprises two classes: malware and legitimate software)

and x u ( +1

{

1 , ..., c

}

n ) the unlabelled instances. The goal of LLGC (and every

semi-supervised algorithm) is to predict the class of the unlabelled instances.

≤

c matrices with non-negative entries, composed of matrices

F =[ F 1 , ..., F n ] T that match to the classification on the dataset

is the set of n

of each

instance x i . with the label assigned by y i =argmax j≤c F i,j . F can be defined as

a vectorial function such as F :

X→ R c to assign a vector F i

to the instances

x i . Y is an n

c matrix such as Y

∈

F with Y i,j

=1when x i

is labelled as

y i = j and Y i,j

= 0 otherwise. Considering this, the LLGC algorithm performs

as follows:

= j and W i,i =0 then

Form the anity matrix W

defined by W i,j =exp −||x i −x j || 2

2 ·σ 2

;

Generate the matrix S = D − 1 / 2 · W · D − 1 / 2 where D is the diagonal matrix

with its ( i, i ) element equal to the sum of the i -th row of W ;

while ¬ Convergence do

F ( t +1)= α · S · F ( t )+(1 − α ) · Y

where α is in the range (0 , 1);

F ∗ is the limit of the sequence {F ( t ) } ;

Label each point x i as argmax j≤c F i,j ;

Fig. 1. LLGC algorithm

setting

the diagonal elements to zero. Suppose that a graph G =( V, E ) is defined within

The algorithm first defines a pairwise relationship W on the dataset

is weighted by the

values in W . Next, the algorithm normalises symmetrically the matrix W of G .

This step is mandatory to assure the convergence of the iteration. During each

iteration each instance receives the information from its nearby instances while

it keeps its initial information. The parameter α denotes the relative amount

of the information from the nearest instances and the initial class information

of each instance. The information is spread symmetrically because S is a sym-

metric matrix. Finally, the algorithm sets the class of each unlabelled speci-

men to the class of which it has received most information during the iteration

process.

, where the vertex set V is equal to

and the edge set

Computational Intelligence in Security for Information Systems

Search WWH ::

Custom Search

Home