Audio Enhancement and Robustness - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

non-linguistic vocalisations [ 51 ] or the segmentation of meeting speech [ 52 ]. A par-

ticular strength is the possibility to use arbitrary functions for the observations without

complication of the parameter learning.

The HCRF models the conditional probability of a class c , given the sequence of

observations X

x 1 ,

x 2 ,...,

x T :

e λ f ( c , Seq , X ) ,

(

,λ) =

(9.24)

(

,λ)

Seq

∈

where

is the parameter vector and f the 'vector of sufficient statistics', and Seq

s 1 ,

s T is the hidden state sequence run through during the computation of

this conditional probability. The probability is normalised by the 'partition function'

s 2 ,...,

(

,λ)

to ensure a properly normalised probability [ 15 ]:

e λ f ( c , Seq , X ) .

(

,λ) =

(9.25)

Seq

∈

The vector f determines the probability to model. With a suited f a left-right

HMM can be imitated [ 15 ]. Let us now now restrict the HCRF to a Markov chain,

but without the requirements of the transition probabilities to sum to one and real

probability densities for the observations. In analogy to a HMM a parametrisation

by transition scores a i , j and observation scores b j (

x t )

can then be reached with the

parameters

, where and i and j are states of the model (cf. Sect. 7.3.2 ) . Forward

and backward recursions (cf. Sect. 7.3.1 ) as for a HMM can then further be used.

9.3.3 Audio Modelling in the Time Domain

Modelling of the raw signal in the time domain is a sparsely pursued option, but can

offer easy explicit noise modelling [ 16 ]. We will look at SAR-HMMs to this end

first, and then at the extension to SLDS.

9.3.3.1 Switching Autoregressive Hidden Markov Models

The SAR-HMM models the audio signal of interest as an autoregressive (AR)

process. The non-stationarity is realised by switching between different AR parame-

ter sets [ 17 ] by a discrete switch variable s t similar to the HMM states. At a time

step t —referring to the sample-level in this case—, exactly one out of S states is

occupied. The state at time step t depends exclusively on its predecessor with the

transition probability p

v t at this time step is assumed as a

linear combination of its R preceding samples superposed by a Gaussian distributed

'innovation'

(

s t |

s t − 1 )

.Thesample

η(

s t )

η(

s t )

and the AR weights c r (

s t )

are the parameter set given by

the state s t :

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home