Information Technology Reference
In-Depth Information
We can extract classification rules from this decision tree:
Table 6 Classification rules deriving from decision tree induction
If frequency of term “derivative” is low then document belongs to
class computer science
Rule 1
If frequency of term “derivative” is medium and frequency of term
“computer” is medium or high then document belongs to class
computer science
Rule 2
If frequency of term “derivative” is medium and frequency of term
“computer” is low then document belongs to class math .
Rule 3
If frequency of term “derivative” is high then document belongs to
class math
Rule 4
Suppose the numbers of times that terms computer, programming language,
algorithm and derivative occur in document D are 5, 1, 1, and 3 respectively. We
need to determine which class document D is belongs to. D is normalized as term
frequency vector.
D = ( 0.5, 0.1, 0.1, 0.3 )
Changing real number into nominal value, we have:
D = ( high, low, low, medium )
According to rule 2 in above table, D is computer science document because in
document vector D , frequency of term “derivative” is medium and frequency of
term “computer” is high .
4 Document Classification Based on Neural Network
4.1 Artificial Neural Network
Artificial Neural Network (ANN) is the mathematical model based on biological
neural network. It consists of a set of processing units which communicate together
by sending signals to each other over a large number of weighted connections. Such
processing units are also called neurons or cells or variables. Each unit is responsible
for receiving input from neighbors or external sources and using this input to
compute an output signal which is propagated to other units. However each unit also
adjusts the weights of connections. There are three types of units:
Input units receive data from outside the network. These units structure the
input layer.
Hidden units own input and output signals that remain within the neural
network. These units structure the hidden layer. There can be one or more
hidden layers.
Output units send data out of the network. These units structure the output
layer.
Search WWH ::




Custom Search