Voice Activity Detection - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

to the adoption of Ephraim and Malah's noise suppression rules [13] for the

voice activity decision rules.

A voice activity decision can be considered as a test of hypotheses: H 0 and

H 1 , which indicate speech absence and presence, respectively. Assuming that

each spectral component of speech and noise has complex Gaussian distri-

bution [13], in which the noise is additive and uncorrelated with speech, the

conditional probability density functions (PDF) of a noisy spectral component

Y k ,given H 0 ,k and H 1 ,k ,are:

exp

πλ N,k

− |

Y k

p(Y k |

H 0 ,k ) =

(10.1)

λ N,k

λ X,k ) exp

Y k |

p(Y k

H 1 ,k )

−

(10.2)

π(λ N,k

λ N,k

λ X,k

where k indicates the spectral bin index, and λ N,k and λ X,k denote the variances

of the noise and speech spectra, respectively.

The likelihood ratio (LR) of the k th

spectral bin, k , is defined from the

above two PDFs as [12]:

exp ( 1

p(Y k

H 1 ,k )

γ k )ξ k

H 0 ,k ) =

(10.3)

p(Y k |

+ ξ k

where γ k and ξ k are the a posteriori and apriori SNRs defined as, γ k

2 /λ N,k

λ X,k /λ N,k . Note that the definition of the aposte-

riori SNR is slightly different from the original one, γ k

Y k

−

1and ξ k

2 /λ N,k [13].

The noise variance is assumed to be known through noise adaptation (see

Section 10.3.2). However, the variance of the speech is unknown, thus the a

priori SNR of the n th frame, ξ (n)

Y k |

, is estimated using the decision-directed (DD)

method [13] as:

X (n − 1 )

ξ (n)

γ (n)

( 1

−

α)MAX

{

, 0

}

(10.4)

λ (n − 1 )

N,k

where α is aweighting term, e.g. 0.98, and the clean speech spectral amplitude,

| X k |

, is estimated using the minimum mean square error of the log spectral

amplitude estimator [14]. The decision about the voice activity is performed

by the geometric mean of the k over all spectral bins as:

exp 1

log k

(10.5)

where K denotes the number of spectral bins.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home