Information Technology Reference
In-Depth Information
The hidden layer is constituted of two hidden units: “computer science”,
“math” . These units (variables) are also binary or Boolean. The output layer has
only one unit named “document class” which is binary or Boolean (0 -
documents belong to computer science class and 1 - documents belong to math
class). The evaluation function used in network is sigmoid function. Our topology
is feed-forward neural network (showed in figure 4) in which the weights can be
initialized arbitrarily.
Note that we denote Boolean value as 0 and 1 (instead of true and false ) for
convenience when representing neural network which only accepts numeric value
for units.
C
0.4
0.6
0.6
P
S
0.4
0.5
L
0.5
0.5
A
M
0.5
0.3 0.7
D
Input layer
Hidden layer
Output layer
Fig. 9 The neural network for document classification
Note that C, P, A and D denote “ computer”, “programming language”,
“algorithm” and “derivative” respectively. S and M denote “ computer science
and “ math ” respectively. L denotes “doc class” .
Given corpus D = { doc1.txt, doc2.txt, doc3.txt, doc4.txt, doc5.txt }. The training
data is showed in following table in which cell ( i, j ) indicates the number of times
that term j (column j ) occurs in document i (row i ).
Table 7 Term frequencies of documents
programming
language
computer
algorithm derivative
class
doc1.txt
5
3
1
1
computer
doc2.txt
5
5
40
5
math
doc3.txt
20
5
20
55
math
doc4.txt
20
55
5
20
computer
doc5.txt
15
15
4
0.3
math
doc6.txt
35
10
45
10
computer
Search WWH ::




Custom Search