Information Technology Reference
In-Depth Information
Note that the minimum of that function is obtained for y = y , as for the
least squares cost function. The extension to problems with several classes is
straightforward. For example, for n classes, the logistics are replaced by the
softmax function,
with z i =
k
e z i
n
y i =
w ik x k + w i 0 .
e z j
j =1
For each example, cross-entropy is expressed by
E = n
y i ln y i +(1
y i )ln(1
y i ) .
i =1
Training
The interested reader will note that, maybe surprisingly, that approach does
not makes computations more complicated: on the contrary, it makes them
simple: actually, that consists in not taking into account the nonlinearity
provided by the logistics in the computation of the gradient,
∂E
∂w ik
y i ) x k .
=( y i
That is equivalent to Widrow-Hoff's training rule described in Chap. 2.
3.2.3 Preprocessing Outputs for Regression
In regression problems, the outputs represent conditional averages. The res-
idues around the average value are assumed to follow a normal centered law.
In order to optimize the design of the model, outputs are therefore centered
and reduced; the averages and variances of the outputs are estimated on the
basis of examples.
The average quadratic error EQM r , computed in the reduced output space,
corresponds to the average quadratic error EQM computed from raw data,
divided by the estimated variance.
N
1
N
( y k − y k ) 2
EQM = EQM r ×σ 2
EQM r =
.
y
k =1
Reduced Error and Coe cient of Nondetermination
The relation between the average quadratic error computed from the centered
reduced variables is the “residual variance divided by total variance” used in
linear regression to express the percentage of the variance not taken into
account by the model. In that case, the one's complement of the average
 
Search WWH ::




Custom Search