Discovering User Interests by Document Classification - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

Table 8 Normalized term frequencies

computer programming

language

algorithm derivative

class

doc1.txt

0.5

0.3

0.1

computer

doc2.txt

0.05

0.4

0.5

math

doc3.txt

0.2

0.05

0.2

0.55

math

doc4.txt

0.2

0.55

0.05

0.2

computer

doc5.txt

0.15

0.4

0.3

math

doc6.txt

0.35

0.1

0.45

0.1

computer

Given threshold α = 0.4 , if the frequency of a term j in document i is equal or

greater than α , we consider that term j exists in document i . Otherwise there is no

existence of term j in document i . So each document is represented as a Boolean

vector. Each element in such vector has two values: 0 and 1 (0 - the respective term

occurs in document and 1 - otherwise). So each Boolean vector is the manifest of

the occurrences of terms in a document. Corpus D becomes a set of Boolean vectors.

Table 9 Boolean document vectors

computer programming

language

algorithm derivative

class

doc1.txt

1

0

computer

doc2.txt

0

1

math

doc3.txt

0

1

math

doc4.txt

0.2

1

0

computer

doc5.txt

0.15

0

1

0

math

doc6.txt

0.35

0

1

0

computer

C

0.2

0.8

0.7

P

S

0.3

0.6

L

0.6

0.4

A

M

0.4

0.2

0.8

D

Input layer Hidden layer Output layer

Fig. 10 Trained neural network

Mining and Analyzing Social Networks

Search WWH ::

Custom Search

Home