Information Technology Reference
In-Depth Information
4. p (
M|D
) is then given by (7.3), where ln p (
D|M
) is replaced by its approxi-
mation
L
( q ).
Appropriate convergence criteria are introduced in the next chapter.
7.4
Predictive Distribution
An additional bonus of a probabilistic basis for LCS is that it provides predictive
distributions rather than simple point estimates. This gives additional informa-
tion about the certainty of the prediction and the specification of confidence
interval. Here, the predictive density for the Bayesian LCS model for regression
is derived.
The question we are answering is: in the light of all available data, how likely
are certain output values for a new input? This question is approached formally
by providing the predictive density p ( y |
x , X , Y ), where x is the
new known input vector, and y its associated unknown output vector, and all
densities are, as before, implicitly conditional on the current model structure
x ,
p ( y |
D
)
M
.
Deriving p ( y |
x ,
7.4.1
D
)
We get an expression for p ( y |
x ,
D
)byusingtherelation
p ( y |
x , X , Y )
(7.102)
=
z
p ( y , z , W , τ , V
x , X , Y )d W d τ d V
|
=
z
p ( y |
x , z , W , τ ) p ( z |
x , V ) p ( W , τ , V
|
X , Y )d W d τ d V
I ) z k g k ( x ) z k
=
z
( y |
W k x 1
N
k
k
×
p ( W , τ , V
|
X , Y )d W d τ d V ,
where z is the latent variable associated with the observation ( x , y ), and
p ( y |
x , z , W , τ ) is replaced by (7.6), and p ( z |
x , V ) by (7.11). As the real
posterior p ( W , τ , V
X , Y ) is not known, it is approximated by the variational
posterior, that is, p ( W , τ , V
|
q W,τ ( W , τ ) q V ( V ). Together with sum-
|
X , Y )
ming over all z , this results in
p ( y |
x , X , Y )
(7.103)
g k ( x ) q V ( v k )d v k
=
k
W k x 1
k
q W,τ ( W k k )
( y |
N
I )d W k d τ k ,
where the factorisation of q V ( V )and q W,τ ( W , τ ) with respect to k and the
independence of the two variational densities was utilised.
Search WWH ::




Custom Search