Information Technology Reference
In-Depth Information
which is kept suciently broad and uninformative by setting a α =10 2 and
b α =10 4 . The combined effect of τ k and α k on the weight vector prior variance
is shown in Fig. 7.3(b).
7.2.4
Mixing by the Generalised Softmax Function
As in Chap. 4, the latent variables are modelled by the generalised softmax
function (4.22), given by
m k ( x )exp( v k φ ( x ))
g k ( x )
p ( z k =1
|
x , v k )=
j =1 m j ( x )exp( v j φ ( x )) .
(7.10)
It assumes that, given that classifier k matched input x , the probability of clas-
sifier k generating observation n is related to φ ( x ) by a log-linear function
exp( v k φ ( x )), parametrised by v k . The transfer function φ :
D V maps
the input into a D V -dimensional real space, and therefore the vector v k is of
size D V and also an element of that space. In LCS, we usually have D V =1
and φ ( x ) = 1 for all x
X→ R
∈X
, but to stay general, no assumptions about φ and
D V will be made.
Making use of the 1-of- K structure of z , its joint probability is given by
K
g k ( x ) z k .
p ( z | x , V )=
(7.11)
k =1
Thus, the joint probability of all z n becomes
N
K
g k ( x n ) z nk ,
p ( Z | X , V )=
(7.12)
n =1
k =1
which fully specifies the model for Z .
7.2.5
Priors on the Mixing Model
Due to the normalisation, the mixing function g k is over-parametrised, as it
would be sucient to specify K
1 vectors v k and leave v K constant [165]. This
would make the values for all v k 's to be specified in relation to the constant v K ,
and causes problems if classifier K is removed from the current set. Thus, g k is
rather left over-parametrised, and it is assumed that all v k 's are small, which is
expressed by the shrinkage prior
( v k | 0 k I )
= β k
2 π
p ( v k |
β k )=
N
D V / 2
exp
2 v k v k .
β k
(7.13)
Thus, the elements of v k are assumed to be independent and zero-mean Gaussian
with precision β k .
 
Search WWH ::




Custom Search