Digital Signal Processing Reference
In-Depth Information
to the adoption of Ephraim and Malah's noise suppression rules [13] for the
voice activity decision rules.
A voice activity decision can be considered as a test of hypotheses:
H
0
and
H
1
, which indicate speech absence and presence, respectively. Assuming that
each spectral component of speech and noise has complex Gaussian distri-
bution [13], in which the noise is additive and uncorrelated with speech, the
conditional probability density functions (PDF) of a noisy spectral component
Y
k
,given
H
0
,k
and
H
1
,k
,are:
exp
2
1
πλ
N,k
−
|
Y
k
|
p(Y
k
|
H
0
,k
)
=
(10.1)
λ
N,k
λ
X,k
)
exp
2
1
|
Y
k
|
p(Y
k
|
H
1
,k
)
=
−
(10.2)
π(λ
N,k
+
λ
N,k
+
λ
X,k
where
k
indicates the spectral bin index, and
λ
N,k
and
λ
X,k
denote the variances
of the noise and speech spectra, respectively.
The likelihood ratio (LR) of the
k
th
spectral bin,
k
, is defined from the
above two PDFs as [12]:
exp
(
1
p(Y
k
|
H
1
,k
)
1
+
γ
k
)ξ
k
=
H
0
,k
)
=
k
(10.3)
p(Y
k
|
1
+
ξ
k
1
+
ξ
k
where
γ
k
and
ξ
k
are the
a posteriori
and
apriori
SNRs defined as,
γ
k
=
2
/λ
N,k
|
λ
X,k
/λ
N,k
. Note that the definition of the
aposte-
riori
SNR is slightly different from the original one,
γ
k
Y
k
|
−
1and
ξ
k
=
2
/λ
N,k
[13].
The noise variance is assumed to be known through noise adaptation (see
Section 10.3.2). However, the variance of the speech is unknown, thus the
a
priori
SNR of the
n
th
frame,
ξ
(n)
=|
Y
k
|
, is estimated using the decision-directed (DD)
k
method [13] as:
X
(n
−
1
)
2
k
ξ
(n)
γ
(n)
=
α
+
(
1
−
α)MAX
{
,
0
}
(10.4)
k
k
λ
(n
−
1
)
N,k
where
α
is aweighting term, e.g. 0.98, and the clean speech spectral amplitude,
|
X
k
|
, is estimated using the minimum mean square error of the log spectral
amplitude estimator [14]. The decision about the voice activity is performed
by the geometric mean of the
k
over all spectral bins as:
exp
1
K
K
=
log
k
(10.5)
k
=
1
where
K
denotes the number of spectral bins.
Search WWH ::
Custom Search