Voice Activity Detection - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

10.3.2 NoiseEstimationBasedonSLR

Depending on the characteristics of the noise source, the short-time spectral

amplitudes of the noise signal can fluctuate strongly from frame to frame. In

order to cope with time-varying noise signals, the variance of the noise spec-

trum is adapted to the current input signal by a soft decision-based method.

The speech absence probability (SAP) of the k th spectral bin, p(H 0 ,k

Y k ) ,can

be calculated by Bayes' rule as:

p(H 0 ,k )p(Y k |

H 0 ,k )

H 1 ,k ) =

p(H 0 ,k

Y k )

(10.8)

p(H 1 ,k )

p(H 0 ,k )p(Y k |

H 0 ,k ) +

p(H 1 ,k )p(Y k |

p(H 0 ,k ) k

where p(H 1 ,k )

p(H 0 ,k ) , and the unknown apriori speech absence proba-

bility (PSAP), p(H 0 ,k ) , is estimated in an adaptive manner given by:

−

p(H (n)

p(H (n − 1 )

0 ,k

− β)p(H (n)

Y (n k ),H (L)

,H (U)

0 ,k ) =

MIN

{

MAX

{ β ˆ

) + ( 1

0 ,k |

}

(10.9)

where β is a smoothing factor, e.g. 0.65. The lower and upper limits, H (L)

0 and

H (U 0 , of the PSAP are determined through experiments, e.g. 0.2 and 0.8. Note

that, for SLR, k is applied to the calculation of the SAP instead of LR, k .

The variance of the noise spectrum of the k th spectral component in the n th

frame, λ (n)

N,k , is updated in a recursive way as:

λ (n)

N,k = ηλ (n − 1 )

N (n)

Y (n k )

+ ( 1

− η)E( |

(10.10)

N,k

where η is a smoothing factor, e.g. 0.95. The expected noise power-spectrum

E( |

N (n)

Y (n k ) is estimated by means of a soft-decision technique [18] as:

N (n)

Y (n k ) =

N (n)

Y (n k ) +

N (n)

Y (n k )

E( |

H 0 ,k )p(H 0 ,k |

E( |

H 1 ,k )p(H 1 ,k |

Y (n)

λ (n − 1 )

N,k

Y (n)

2 p(H 0 ,k |

)

p(H 1 ,k |

)

(10.11)

Y (n k ) . During some tests, it is observed that

SLR-based adaptation is useful for the estimation of the noise spectra with

high variations, such as a babble noise source.

Y (n k ) =

where p(H 1 ,k |

−

p(H 0 ,k |

10.3.3 Comparison

The effect of the smoothing factor κ in equation (10.6) is shown in Figure 10.16.

Note that the case of κ

0 reduces equation (10.6) to the LR-based method. It

is obvious from the results that the detection accuracy increases with increase

in κ , at the offset regions without serious degradations in the performance

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home