Digital Signal Processing Reference
In-Depth Information
Increase in channel capacity by statistical multiplexing: A channel can be
granted just during talk-spurts and released during pauses. Once granted,
a user occupies a channel until the end of a talk-spurt and releases
it immediately after the last active speech frame. To get the channel
allocation again, the user makes a request at the start of the next talk-spurt.
This way the channel resources can be utilized in a more efficient way
by the statistical multiplexing scheme, which allows a number of users to
communicate at the same time over limited channel resources. Note: in
statistical multiplexing, there is a possibility that there are no free channel
slots when a user makes a request. In this case, the new user may be
rejected after a time-out, which may cause information loss resulting in
some quality degradations.
Reduction in packet losses when transmitting voice over packet-based
networks: A packet-based system can be overloaded with more pack-
ets than it can handle. The congestion of packet-based systems can be
reduced during voice communication by producing packets only during
active speech regions and cutting out packets for the inactive speech
regions.
Bit-rate reduction: In addition to the bit-rate reduction achieved by speech
compression techniques, the use of a VAD together with silence compres-
sion (cutting out the inactive speech regions) gives additional reduction in
the bit-rate regardless of speech coders.
The VAD usually produces a binary decision for a given speech segment
(usually 10-20ms long) indicating either speech presence or absence, which is
quite easy for clean background speech. For example, by checking the energy
level of the input signal, it is possible to obtain a high speech/nonspeech
detection performance. However, in real environments, the input signal may
be mixed with noise characteristics which may be unknown and changing
with time. In some cases where the background noise is significantly high,
the speech may be obscured by this noise. Especially, the unvoiced sounds,
which are important for speech intelligibility, may be misdetected in such
noisy environments. Figure 10.1 shows an example for a noisy speech segment
with vehicle noise of 5 dB signal to noise ratio (SNR). As can be seen from the
figure, some low energy speech parts are fully submerged in noise, making
it very difficult to discriminate these talk-spurts even by visual inspection.
Incorrect classification of these talk-spurts can cause clipped sounds which
may result in significantly degraded speech quality. On the other hand, the
increase in false detection of silence loses the potential benefits of silence
compression. There is a trade-off in VAD performance, maximizing the
detection rate for active speech while minimizing the false detection rate of
inactive speech regions.
Search WWH ::




Custom Search