Audio Recognition - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

α

where

is the steepness parameter, the hyperbolic tangent function as special case

of the sigmoid function with additive offset, and the unit step

0 f u

<

0

T

(

u

) =

.

(7.40)

1 f u

≥

0

The sigmoid function is particularly popular owing to its approximation of an

ideal threshold decision (cf. Fig. 7.6 ) while being differentiable. The latter will be

needed throughout the training of the network.

A multiplicity of different network topologies exist, of which the most important

will be introduced next.

7.2.3.1 Feed Forward Neural Networks

The most commonly used form of feed forward neural networks (FNN) is the mul-

tilayer perceptron (MLP) [ 18 ]: It consists of a minimum of three layers, one input

layer—typically without processing—, one or more hidden layers, and an output

layer. All connections feed forward from one layer to the next without any back-

ward connections. MLPs classify all input feature vectors over time independently.

In general, encoding of the outputs

y j with j

ˆ

=

1

,...,

M of the last layer that can

be written as vector

y is required. A popular way is to provide one output neuron

for regression and one per class in the case of classification. As an advantage, this

provides a measure of confidence of the network: The 'softmax' function as a transfer

function normalises the sum of all outputs to one in order to allow for interpretation

as posterior probability P

ˆ

(

j

|

x

)

of the final output:

e u

j = 1 e u .

P

(

j

|

x

) =ˆ

y j

=

(7.41)

In the recognition phase the computation is processed step-wisely from the input

layer to the output layer. Per layer the weighted sum of the inputs from the previous

layer is computed for each neuron and weighted by the non-linearity. Using the soft-

max function at the outputs, and the named encoding, the recognised class is assigned

by maximum search. As an alternative, one can choose, e.g., a binary encoding of

the classes with the network's outputs.

7.2.3.2 Back Propagation

Among the multiplicity of learning algorithms for ANNs, the gradient descent-based

back propagation algorithm [ 19 ] is among the most popular ones and allowed for

the break-through of ANN. Let W

={ w j }

summarise the weight vectors

w j

of a

layer with j

=

1

,...,

J and J being the number of neurons in this layer. As target

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home