Speech Enhancement - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

y(i)

Far End

A

Echo

Canceller

Near End

x(i)

^

r(i)

−

Noise

Suppressor

+

B

s(i)

z(i)

+

x(i) + r(i) + n(i)

Figure 11.25 Block diagram of cascaded echo cancellation and noise suppression

usually distorts the echo signal in a nonlinear manner, which may make echo

cancellation more difficult. By placing the noise suppressor after the echo

canceller, to remove the residual echo error as well as noise, may therefore be

more appropriate as shown in Figure 11.25.

The performance of this set-up has been tested both subjectively and

objectively. Subjective testing was carried out through informal listening

tests, while objective testing was conducted through various filter coefficient

convergence behaviours. Two different echoes were used for this purpose.

The first was a simple echo resulting froma single delay and attenuation of the

far-end speech signal and the second was the sum of three different delayed

and attenuated versions of the far-end speech. Each echo was mixed with the

near-end speech signal along with vehicle noise contamination resulting in

SNRs of 0, 5, 10, 15 and 20 dB.

Results obtained using the simple echo case are shown in Figures 11.26-

11.31. The echo was generated by delaying the far end speech by 40 samples

and attenuated through a factor of 0.48. Part (a) of Figures 11.26-11.31 shows

the input to the cascaded system and the corresponding output signals and

part (b) shows the convergence track of filter coefficients h 40 and h 0 .The

robustness of the system under noisy conditions and the convergence of the

filter coefficients ( h 40 and h 0 ), even in the presence of near-end speech, are

quite evident in Figures 11.26-11.31. Note that, as also highlighted above,

neither a near-end speech detector nor a switch for filter coefficient adaptation

is needed. All that is needed is an initial training period for which the w k (i)

are set to one. In this setup, the initial period is 1 second for which the near-

end speech is assumed to be absent. The weighting function is switched on

after that and is responsible for convergence of the filter coefficients during

near-end speech presence and silences in the near-end signal. Based on the

average track of each filter coefficient (i.e. h k (i) ) and the selection of the γβ

value in the w k (i) definition, only the step changes that follow the average

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home