Information Technology Reference
In-Depth Information
Table 8 Normalized term frequencies
computer programming
language
algorithm derivative
class
doc1.txt
0.5
0.3
0.1
0.1
computer
doc2.txt
0.05
0.05
0.4
0.5
math
doc3.txt
0.2
0.05
0.2
0.55
math
doc4.txt
0.2
0.55
0.05
0.2
computer
doc5.txt
0.15
0.15
0.4
0.3
math
doc6.txt
0.35
0.1
0.45
0.1
computer
Given threshold α = 0.4 , if the frequency of a term j in document i is equal or
greater than α , we consider that term j exists in document i . Otherwise there is no
existence of term j in document i . So each document is represented as a Boolean
vector. Each element in such vector has two values: 0 and 1 (0 - the respective term
occurs in document and 1 - otherwise). So each Boolean vector is the manifest of
the occurrences of terms in a document. Corpus D becomes a set of Boolean vectors.
Table 9 Boolean document vectors
computer programming
language
algorithm derivative
class
doc1.txt
1
0
0
0
computer
doc2.txt
0
0
1
1
math
doc3.txt
0
0
0
1
math
doc4.txt
0.2
1
0
0
computer
doc5.txt
0.15
0
1
0
math
doc6.txt
0.35
0
1
0
computer
C
0.2
0.8
0.7
P
S
0.3
0.6
L
0.6
0.4
A
M
0.4
0.2
0.8
D
Input layer Hidden layer Output layer
Fig. 10 Trained neural network
Search WWH ::




Custom Search