Digital Signal Processing Reference
In-Depth Information
F (η) =
sup
θ
{ θ, η
F
(θ) }
(16.5)
We get the maximum for η
=∇
F
(θ)
. The parameters η are called expectation
parameters since η =
].
Gradient of F and of its dual F are inversely reciprocal:
E [ t
(
x
)
=
F 1
F
(16.6)
and F itself can be computed by:
F 1
F =
+
constant
.
(16.7)
Notice that this integral is often difficult to compute and the convex conjugate F
of F may be not known in closed-form. We can bypass the anti-derivative operation
by plugging in Eq. ( 16.5 ) the optimal value
) = η (that is, θ = (
F 1
F
)(η)
).
We get
F (η) = (
F 1
F 1
)(η), η
F
((
)(η))
(16.8)
F 1
F , but allows us to discard
This requires to take the reciprocal gradient
=∇
the constant of integration in Eq. ( 16.7 ).
Thus a member of an exponential family can be described equivalently with the
natural parameters or with the dual expectation parameters.
16.2.3 Bregman Divergences
The Kullback-Leibler (KL) divergence between two members of the same expo-
nential family can be computed in closed-form using a bijection between Bregman
divergences and exponential families. Bregman divergences are a family of diver-
gences parameterized by the set of strictly convex and differentiable functions F :
B F (
p
,
q
) =
F
(
p
)
F
(
q
)
p
q
,
F
(
q
)
(16.9)
F is a strictly convex and differentiable function called the generator of the
Bregman divergence.
The family of Bregman divergences generalizes a lot of usual divergences, for
example:
x 2 ,
the squared Euclidean distance, for F
(
x
) =
the Kullback-Leibler (KL) divergence, with the Shannon negative entropy F
(
x
) =
i = 1 x i log x i (also called Shannon information).
Search WWH ::




Custom Search