Digital Signal Processing Reference
In-Depth Information
to the adoption of Ephraim and Malah's noise suppression rules [13] for the
voice activity decision rules.
A voice activity decision can be considered as a test of hypotheses: H 0 and
H 1 , which indicate speech absence and presence, respectively. Assuming that
each spectral component of speech and noise has complex Gaussian distri-
bution [13], in which the noise is additive and uncorrelated with speech, the
conditional probability density functions (PDF) of a noisy spectral component
Y k ,given H 0 ,k and H 1 ,k ,are:
exp
2
1
πλ N,k
|
Y k
|
p(Y k |
H 0 ,k ) =
(10.1)
λ N,k
λ X,k ) exp
2
1
|
Y k |
p(Y k
|
H 1 ,k )
=
(10.2)
π(λ N,k
+
λ N,k
+
λ X,k
where k indicates the spectral bin index, and λ N,k and λ X,k denote the variances
of the noise and speech spectra, respectively.
The likelihood ratio (LR) of the k th
spectral bin, k , is defined from the
above two PDFs as [12]:
exp ( 1
p(Y k
|
H 1 ,k )
1
+
γ k k
=
H 0 ,k ) =
k
(10.3)
p(Y k |
1
+ ξ k
1
+ ξ k
where γ k and ξ k are the a posteriori and apriori SNRs defined as, γ k
=
2 N,k
|
λ X,k N,k . Note that the definition of the aposte-
riori SNR is slightly different from the original one, γ k
Y k
|
1and ξ k
=
2 N,k [13].
The noise variance is assumed to be known through noise adaptation (see
Section 10.3.2). However, the variance of the speech is unknown, thus the a
priori SNR of the n th frame, ξ (n)
=|
Y k |
, is estimated using the decision-directed (DD)
k
method [13] as:
X (n 1 )
2
k
ξ (n)
γ (n)
=
α
+
( 1
α)MAX
{
, 0
}
(10.4)
k
k
λ (n 1 )
N,k
where α is aweighting term, e.g. 0.98, and the clean speech spectral amplitude,
| X k |
, is estimated using the minimum mean square error of the log spectral
amplitude estimator [14]. The decision about the voice activity is performed
by the geometric mean of the k over all spectral bins as:
exp 1
K
K
=
log k
(10.5)
k
=
1
where K denotes the number of spectral bins.
Search WWH ::




Custom Search