Information Technology Reference
In-Depth Information
follows the strategy of minimization of the expected classification risk. The
strategy can be explained in terms of an n -dimensional input vector x belonging to
one of m possible classes with the probability density functions
px p x
( ),
( ),...,
p x .
( )
1
2
m
The architecture of a probabilistic network, shown in Figure 3.11, consists of an
input layer followed by three computational layers. It has a striking similarity with
a multilayer perceptron network. The network is capable of discriminating two
pattern categories represented through the positive and negative output signals. To
extend the network capability of multiplying discrimination, additional network
outputs and the corresponding number of summation units are required.
The input layer of a probabilistic network is simply a distribution layer that
provides the normalized input signal values to all classifying networks that make
up a multiple classes classifier. The subsequent layer consists of a number of
pattern units , fully connected to the input layer through adjustable weights that
correspond to the number of categories to be classified. Each pattern unit forms the
product of the input vector x with the weight vector w. The product value, before
being led to the corresponding summation unit , undergoes the initial nonlinear
operation
e V
(
xw
1)
i
2
Fxw
(
)
.
i
However, since both the input pattern and the weighting vectors are normalized
to the unit length, the last relation is to be rewritten as
n
¦
2
(
xw
)
j
ij
j
1
2
Fxw
(
)
e
2
V
.
i
The summation units finally add the signals coming from the pattern units
corresponding to the category selected for the current training pattern.
3.4 Network Training Methods
We now turn our attention to some training aspects of neural networks, particularly
to the aspects of training process acceleration and training process results. Our
primary interests are the supervised learning algorithms , the most frequently used
in real applications, such as the backpropagation training algorithm , also known
as the generalized delta rule .
The backpropagation algorithm was initially developed by Paul Werbos in
1971 but it remained almost unknown until it was “rediscovered” by Parker in
1982. The algorithm, however, became widely popular after being clearly
formulated by Rumelhart et al. (1986), which was a triggering moment for
 
Search WWH ::




Custom Search