Digital Signal Processing Reference
In-Depth Information
Fig. 9.5 Observation model
for noisy audio
x t
n t
y t
variables x t , but not between the discrete state variables s t [ 13 ]. An extension in
[ 36 ] includes time dependencies between the hidden state variables, similar as in
enhancing a GMM to a HMM. A SLDM as in Fig. 9.4 is described by
p
(
x t ,
s t |
x t โˆ’ 1 ) = N(
x t ;
A
(
s t )
x t โˆ’ 1 +
b
(
s t ),
C
(
s t )) ยท
p
(
s t )
(9.22)
T
p
(
x 1 : T ,
s 1 : T ) =
p
(
x 1 ,
s 1 )
p
(
x t ,
s t |
x t โˆ’ 1 ).
(9.23)
t =
2
The EM algorithm can be used for the learning of the parameters of the SLDM,
namely A
. If one sets the number of states to one the SLDM
turns into a LDM to compute the parameters A , b , and C required for the noise
modelling LDM.
(
s
)
, b
(
s
)
, and C
(
s
)
9.2.2.3 Observation Model
The observation model describes the relationship of the noisy observation y t and the
hidden audio and noise features. In Fig. 9.5 , the graphical model representation of
such a model is given by the zero variance observation model with SNR inference as
in [ 42 ]. It is assumed that audio of interest x t and noise n t mix linearly in the time
domain. In the cepstral domain, for example, this corresponds to a non-linear mixing.
9.2.2.4 Posterior Estimation and Enhancement
To reduce the computational complexity of the posterior estimation, an approxima-
tion is given by the restriction of the search space size by the generalised pseudo-
Bayesian (GPB) algorithm [ 43 ]. It neglects distinct state histories with differences
more than r frames in the past. Thus, with T as the sequence length, the infer-
ence complexity reduces from S T
to S r
T . In the GPB algorithm, one
'collapses', 'predicts', and 'observes' for each of the audio frames. Estimates of the
moments of x t representing the de-noised audio features are computed based on
the Gaussian posterior as calculated during the 'observation' in the GPB algorithm.
In this process, clean features are assumed to be the Minimum Mean Square Error
(MMSE) estimate E
where r
. SLDM feature enhancement can lead to outstanding
results including the case of coloured Gaussian noise and negative SNR. This comes
by the effort of modelling noise. The audio model's linear dynamics model the the
smooth time evolution of typical audio of interest such as speech, music, or cer-
tain sound types. The switching states express the piecewise stationarity typical in
[
x t |
y 1 : t ]
 
 
Search WWH ::




Custom Search