Digital Signal Processing Reference
In-Depth Information
11.2 Review of STSA-based Speech Enhancement
Assuming that the noise
d(n)
is additive to the speech signal
x(n)
,thenoisy
speech
y(n)
can be written as,
y(n)
=
x(n)
+
d(n),
for 0
≤
n
≤
K
−
1
(11.1)
where
n
is the time index. The objective of speech enhancement is to find
the enhanced speech
ˆ
x(n)
given
y(n)
, with the assumption that
d(n)
is
uncorrelated with
x(n)
. The time-domain signals can be transformed to the
frequency domain as,
Y
k
=
X
k
+
D
k
,
for 0
≤
k
≤
K
−
1
(11.2)
where
Y
k
,
X
k
,and
D
k
denote the short-time DFT of
y(n)
,
x(n)
,and
d(n)
,
respectively. The STSA-based speech enhancement filters out the noise by
modifying the spectral amplitudes of
Y
k
in equation (11.2). Therefore, the
enhanced spectrum
X
k
can be written in terms of the modification factor
(gain)
G
k
and the noisy spectrum
Y
k
as,
X
k
=
G
k
Y
k
,
for 0
≤
G
k
≤
1
(11.3)
The gain
G
k
is a function of
a posteriori
SNR,
2
|
Y
k
|
≡
γ
k
(11.4)
E(
|
D
k
|
2
)
and
apriori
SNR,
2
)
E(
|
X
k
|
ξ
k
≡
(11.5)
E(
|
D
k
|
2
)
2
)
are the statistical variances of the
k
th
spectral
components of the noise and the speech, respectively. The function definition
of the gain
G
k
depends on specific enhancement methods. The
a posteriori
SNR
γ
k
in equation (11.4) can be obtained easily as
Y
k
is the input noisy spectrum
and
E(
2
)
and
E(
|
where
E(
|
D
k
|
X
k
|
2
)
can be obtained through a noise adaptation procedure discussed
in Section 11.3. However, the speech variance
E(
|
D
k
|
2
)
for the estimation of
ξ
k
in equation (11.5) is not available. As a solution, Ephraim and Malah [10]
proposed the decision-directed (DD) method given by,
|
X
k
|
α
|
X
(t
−
1
)
2
|
ξ
(t)
α)MAX(γ
(t)
=
2
)
+
(
1
−
−
1
,
0
)
(11.6)
k
k
D
(t)
E(
|
|
k
where 0
≤
α<
1and
t
is the frame index.
Search WWH ::
Custom Search