Digital Signal Processing Reference
In-Depth Information
indeed sucient to determine the patient diagnosis quite well.
Algorithm
Supervised learning algorithms try to approximate a given function
f :
R
n
m by using a number of given sample-observation pairs
A
⊂ R
( x λ , f ( x λ ))
A .If A is finite, we speak of a classification problem.
Typical examples of supervised learning algorithms are polynomial and
spline interpolation or artificial neural network (ANN) learning. In many
practical situations, ANNs have the advantage of higher generalization
capability than other approximation algorithms, especially when only
few samples are available.
McCulloch and Pitts [167] were the first to describe the abstract
concept of an artificial neuron base on the biological picture of a real
neuron. A single neuron takes a number of input signals, sums these and
plugs the result into a specific activation function (for example a (trans-
lated) Heaviside function or an arc tangent). The neural network itself
consists of a directed graph with an edge labeling of real numbers called
weights. At each graph node we have a neuron that takes the weighted
input and transmits it to all following neurons. Using ANNs has the ad-
vantage that in neural networks, which are adaptive systems, we know
for a given energy function how to algorithmically minimize this function
(for example, using the standard accelerated gradient descent method).
When trying to learn the function f , we use as the energy function the
summed square error λ |
∈ R
n
×
| 2 ,where y denotes the neural
network output function. Moreover, more general functions can then be
approximately learned using the fact that suciently complex neural
networks are so called universal approximators [119]. For more details
about ANNs, see some of the many available textbooks (e.g. [9] [110]
[113]).
We will restrict ourselves to feed forward layered neural networks.
Furthermore, we found that simple single-layered neural networks (per-
ceptrons) already suced to learn the diagnosis data well. In addition,
they have the advantage of easier rule extraction and interpretation.
A perceptron with output dimension 1 consists of only a single
neuron, so the output function y can be written as
f ( x λ )
y ( x λ )
y ( x )= θ ( w x + w 0 )
Search WWH ::




Custom Search