Information Technology Reference
In-Depth Information
It should be noted that on equating the denominator of both expressions to zero,
we observe that the function goes unbounded at points of the type
(
,(
+
)ˀ)
for
any natural number 'n'. Figure 3.2 a is a plot of the real part and Fig. 3.2 b is a plot
of the imaginary part of Haykin activation function. Both figures are characterized
by prominent peaks (singular points). Invoking the Liouville theorem, we find that
the functions are unbounded and hence qualify as activation functions. To avoid the
singular points, the inputs to the neuron should be scaled to a region that is devoid
of them [ 21 ]. In the update rule for the weights with Haykin activation function
the derivative of Eq. 3.1 with respect to z needs to be computed, which is f C (
0
2 n
1
z
)
(
. The surface plots of the expressions for real and imaginary parts of the
derivative of Haykin's activation function with respect to real part x the imaginary
part y 1 are displayed in Fig. 3.2 c-f, respectively. All surfaces, as can be seen, are
characterized by peaks. The backpropagation learning algorithm developed with
the Haykin activation function has singularities at the countably many points, the
derivative of the Haykin activation also vanishes at these points.
1
f C (
z
))
3.2.3.2 Problem with Haykin's Activation Function
or exp z 2
is extended from real to complex, it is seen that if z approaches any value in the set
{
When the domain of conventional activation like sigmoid in Eq. 3.1 ,tan
(
z
)
is unbounded. It
was also suggested inLeung andHaykin [ 21 ] that to avoid the problemof singularities
in the sigmoid function f
0
±
j
(
2 n
+
1
) ˀ }
where n is integer, then
|
f C (
z
) | →∞
thus f C (
z
)
, the input data should be scaled to some region in the
complex plane. The position of singularities disturb the training scheme as whenever
some intermediate weights fall in the vicinity of the singular points, it was observed
that the whole training process down the line receives a jolt. This is revealed by the
error plot of the function, which is characterized by peaks. The typical point scatter
shown in the Fig. 3.3 a is a distribution of the hidden layer weights as the training
process is on. The figure shows four singular points of the Haykin activation that are
completely engulfed by the cloud of points. As can be observed, they cluster around
the some singular points which eventually results in the peak type error-epochs
characteristic. The typical error function graph with Haykin activation is shown
in Fig. 3.3 b. The training process produced many peaks as a result of the singular
activation configurations encountered because of the activation function's singular
points. The complex backpropagation developed over this activation functions fails
to solve many problems.
In case of complex-valued networks, T Adali (2003) broadly categorized the fully
complex-valued activation functions by their properties [ 11 ] into three types. It was
also shown that universal approximation can be achieved for each of them. The
first type of complex-valued functions concerns the functions without any singular
points. These functions can be used as activation functions and the networks with
this type of activation functions are shown as good approximators. Although some of
(
z
)
1 Finding the expressions for real and imaginary parts of the derivative of Haykin's activation
function with respect to real part x the imaginary part y are left for interested readers.
Search WWH ::




Custom Search