Audio Enhancement and Robustness - Intelligent Audio Analysis - page 156

Digital Signal Processing Reference

In-Depth Information

Fig. 9.6

SAR-HMM as DBN

s t-3

s t-2

s t-1

s t

structure

v t-3

v t-2

v t-1

v t

R

2

v t =−

c r (

s t )v t − r + η(

s t )

with

η ∼ N(η ;

0

,σ

(

s t )).

(9.26)

r

=

1

models variations from pure autoregression rather than an indepen-

dent additive noise process. The joint probability of a sequence of length T is

There,

η(

s t )

T

p

(

s 1 : T ,v 1 : T ) =

p

(v 1 |

s 1 )

p

(

s 1 )

p

(v t | v t − R : t − 1 ,

s t )

p

(

s t |

s t − 1 ).

(9.27)

t

=

2

Figure 9.6 visualises the SAR-HMM as DBN structure. Switching of the different

AR models is forcedly 'slowed down' by introducing an constant K . The model then

needs to remain in a state for an integer multiple of time steps. This is needed, as

considerably more sample values usually exist than features on the frame level.

The EM algorithm can be used for learning of the AR parameters. Based on the

forward-backward algorithm (cf. Sect. 7.3.1 ) the distributions p

(

s t | v 1 : T )

are learnt.

The fact that an observation

v t depends on R predecessors makes the backward pass

more complicated than in the case of an HMM. A 'correction smoother' [ 53 ] can

thus be applied such that the backward pass calculates the posterior p

(

s t | v 1 : T )

by

'correcting' the forward pass's output.

9.3.3.2 Autoregressive Switching Linear Dynamical Systems

With the extension of the SAR-HMM to an AR-SLDS, a noise process can explicitly

be modelled [ 17 ]. The observed audio sample

v t of interest is then modelled as a

noisy version of a hidden clean sample that is obtained from the projection of a

hidden vector h t

with the dynamic properties of a LDS:

∼ N η t

s t ) .

h t − 1 + η t

η t

h t =

A

(

s t )

,

with

;

0

,Σ H (

(9.28)

describes the dynamics of the hidden variable that

depends on the state s t at time step t . A Gaussian distributed hidden 'innovation' vari-

able

The transition matrix A

(

s t )

η t

models variations from 'pure' linear state dynamics. As for

η t in Eq. ( 9.26 )

Next Page

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home