Information Technology Reference
In-Depth Information
x T Ay be the inner product weighted by
product of x and y ,atlet
x , y
A
A , forming the inner product space
·
,
· A . Then,
x
A
x , x
A is the norm
associated with the inner produce space
·
,
· A . Any two vectors x , x are said to
be A -orthogonal, if
x , x
A =0.Notethat
x
x
I is the Euclidean norm,
where I is the identity matrix.
5.2.1 The Weight Vector
Using the matrix notation introduced in (3.4), and defining the diagonal N × N
matching matrix M k of classifier k by M k =diag( m ( x 1 ) ,...,m ( x N )), in this
chapter simply denoted M , (5.5) can be rewritten to
( Xw
y ) =mi w
y ) T M ( Xw
2 M .
mi w
Xw
y
(5.7)
Thus, the aim is to find the w that minimises the weighted distance between the
estimated outputs Xw and the observed outputs y in the inner product space
·
· M . This distance is convex with respect to w and therefore has a unique
minimum [26]. Note that as the output space is single-dimensional, the set of
observed outputs is given by the vector y rather than the matrix Y .
The solution to (5.7) is found by setting its first derivative to zero, resulting
,
in
w = X T MX 1 X T My . (5.8)
Alternatively, a numerically more stable solution that can also be computed if
X T MX is singular and therefore cannot be inverted, is
w = MX +
My ,
(5.9)
where X +
( X T X ) 1 X T denotes the pseudo-inverse of matrix X [19].
Using the weight vector according to (5.8), the matching-weighted vector of
estimated outputs X w evaluates to
X w = X X T MX 1 X T My .
(5.10)
Observe that X ( X T MX ) 1 X T M is a projection matrix that projects the vec-
tor of observed outputs y onto the hyperplane
{
|
R
D X
}
Xw
w
with respect to
·
· M . This result is intuitively plausible, as the w that minimises the weigh-
ted distance
,
M between the observed and the estimated outputs is
the closest point on this hyperplane to y with respect to ·, · M ,whichisthe
orthogonal projection of y in ·, · M onto this plane. This concept will be used
extensively in Chap. 9.
Xw
y
5.2.2 The Noise Precision
Equation (5.6) needs to be solved in order to get the maximum likelihood noise
precision. As before, we evaluate the maximum of (5.6) by setting its first deri-
vative with respect to τ to zero, to get
τ 1 = c 1
2 M ,
X w
y
(5.11)
 
Search WWH ::




Custom Search