Digital Signal Processing Reference
In-Depth Information
The resulting error signal e ( n ) is processed in two different branches: As shown
at the bottom of Fig. 7.1 , it is stored in a circular buffer to be fed into the ASR
engine without further processing. In the upper branch of the TAP system, it is
analyzed by an integrated additional noise reduction and voice activity detection
(VAD) as described in Sect. 7.4 . The latter's output is a voice activity signal which
is buffered and evaluated by a control unit. Upon receiving a PTS event, the control
unit locates the speech onset using buffered voice activity signal both from the past
and present. The control unit also initializes and triggers the ASR engine, which is
then supplied with a correct portion of the error signal from the lower buffer,
depending on the detected SOU.
7.3 Acoustic Echo Cancelation and Postfilter
The AEC stage of our system employs the FDAF as described in [ 4 ], which unifies
AEC and a postfilter for residual echo and noise suppression in the frequency
domain. While most echo cancellers model the impulse response h( n ) of the
LEM system—or its transfer function—deterministically, the FDAF is based on a
statistical model.
As proposed in [ 4 ], the impulse response h( n ) is modeled as a random process
with the expectation h 0 ( n ) and covariance vector
.
Actual estimation is performed in the frequency domain. Assuming that
variations of the LEM path over time are gradual, the LEM system transfer function
estimate H ' ð
F hh ð
n
Þ
k
Þ
is updated recursively according to
H 1 ð
A H ' ð
k
Þ¼
k
ÞþD
H ' ð
k
Þ;
(7.4)
'
is the time frame index, k is the frequency bin index, A
¼
where
0.9995 is the
H ' ( k ) is the echo path update as computed according to [ 4 ].
Multiplying the estimated LEM transfer function H ' ð
D
transmission factor, and
with a short-time Fourier
transform (STFT) X ' ( k ) of the loudspeaker source signal yields the estimated echo
component D ' ð
k
Þ
Þ in the short-time spectral domain. This estimate is then subtracted
from the STFT Y ' ( k ) of the microphone signal, resulting in an error signal E ' ( k ).
Note that before applying the STFT to the signals x ( n ) and y ( n ), they are subject to a
high-pass filter with a cutoff frequency f c ¼
k
200 Hz to remove low-frequency noise.
To reduce the noise component and to suppress the residual echo that is still
present in the error signal E ' ( k ), the FDAF includes an additional frequency-
domain postfilter. Its application to the error signal yields an improved estimate
of the desired speech signal as
Þ¼E ' ð
E ' ð
k
k
Þ
W ' ð
k
Þ;
(7.5)
Search WWH ::




Custom Search