Digital Signal Processing Reference
In-Depth Information
11.2 Review of STSA-based Speech Enhancement
Assuming that the noise d(n) is additive to the speech signal x(n) ,thenoisy
speech y(n) can be written as,
y(n) =
x(n) +
d(n),
for 0
n
K
1
(11.1)
where n is the time index. The objective of speech enhancement is to find
the enhanced speech
ˆ
x(n) given y(n) , with the assumption that d(n) is
uncorrelated with x(n) . The time-domain signals can be transformed to the
frequency domain as,
Y k =
X k +
D k ,
for 0
k
K
1
(11.2)
where Y k , X k ,and D k denote the short-time DFT of y(n) , x(n) ,and d(n) ,
respectively. The STSA-based speech enhancement filters out the noise by
modifying the spectral amplitudes of Y k in equation (11.2). Therefore, the
enhanced spectrum X k can be written in terms of the modification factor
(gain) G k and the noisy spectrum Y k as,
X k =
G k Y k ,
for 0
G k
1
(11.3)
The gain G k is a function of a posteriori SNR,
2
|
Y k |
γ k
(11.4)
E( |
D k |
2 )
and apriori SNR,
2 )
E(
|
X k
|
ξ k
(11.5)
E( |
D k |
2 )
2 ) are the statistical variances of the k th spectral
components of the noise and the speech, respectively. The function definition
of the gain G k depends on specific enhancement methods. The a posteriori SNR
γ k in equation (11.4) can be obtained easily as Y k is the input noisy spectrum
and E(
2 ) and E( |
where E( |
D k |
X k |
2 ) can be obtained through a noise adaptation procedure discussed
in Section 11.3. However, the speech variance E(
|
D k
|
2 ) for the estimation of
ξ k in equation (11.5) is not available. As a solution, Ephraim and Malah [10]
proposed the decision-directed (DD) method given by,
|
X k |
α | X (t 1 )
2
|
ξ (t)
α)MAX(γ (t)
=
2 ) +
( 1
1 , 0 )
(11.6)
k
k
D (t)
E(
|
|
k
where 0
α< 1and t is the frame index.
Search WWH ::




Custom Search