Audio Enhancement and Robustness - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Fig. 9.5 Observation model

for noisy audio

x t

n t

y t

variables x t , but not between the discrete state variables s t [ 13 ]. An extension in

[ 36 ] includes time dependencies between the hidden state variables, similar as in

enhancing a GMM to a HMM. A SLDM as in Fig. 9.4 is described by

(

x t ,

s t |

x t − 1 ) = N(

x t ;

(

s t )

x t − 1 +

(

s t ),

(

s t )) ·

(

s t )

(9.22)

(

x 1 : T ,

s 1 : T ) =

(

x 1 ,

s 1 )

(

x t ,

s t |

x t − 1 ).

(9.23)

t =

The EM algorithm can be used for the learning of the parameters of the SLDM,

namely A

. If one sets the number of states to one the SLDM

turns into a LDM to compute the parameters A , b , and C required for the noise

modelling LDM.

(

)

, b

(

)

, and C

(

)

9.2.2.3 Observation Model

The observation model describes the relationship of the noisy observation y t and the

hidden audio and noise features. In Fig. 9.5 , the graphical model representation of

such a model is given by the zero variance observation model with SNR inference as

in [ 42 ]. It is assumed that audio of interest x t and noise n t mix linearly in the time

domain. In the cepstral domain, for example, this corresponds to a non-linear mixing.

9.2.2.4 Posterior Estimation and Enhancement

To reduce the computational complexity of the posterior estimation, an approxima-

tion is given by the restriction of the search space size by the generalised pseudo-

Bayesian (GPB) algorithm [ 43 ]. It neglects distinct state histories with differences

more than r frames in the past. Thus, with T as the sequence length, the infer-

ence complexity reduces from S T

to S r

T . In the GPB algorithm, one

'collapses', 'predicts', and 'observes' for each of the audio frames. Estimates of the

moments of x t representing the de-noised audio features are computed based on

the Gaussian posterior as calculated during the 'observation' in the GPB algorithm.

In this process, clean features are assumed to be the Minimum Mean Square Error

(MMSE) estimate E

where r

. SLDM feature enhancement can lead to outstanding

results including the case of coloured Gaussian noise and negative SNR. This comes

by the effort of modelling noise. The audio model's linear dynamics model the the

smooth time evolution of typical audio of interest such as speech, music, or cer-

tain sound types. The switching states express the piecewise stationarity typical in

[

x t |

y 1 : t ]

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home