Digital Signal Processing Reference
In-Depth Information
In practice, the threshold of the LR is set between 0.2 dB and 0.8 dB, and
both the a posteriori and the apriori SNRs are bounded between
15 dB
and 15 dB.
Assuming that the noise characteristics change slowly, delay in estimation
of the noise variance λ (n 1 )
N,k in equation (10.4) does not seriously affect the a
priori SNR ξ (n k . However, the spectral amplitude of the speech signal may
change abruptly, particularly in onset and offset regions, in which the power
of the spectral bins can increase and decrease rapidly, respectively. At the
offset region, γ k can be low but ξ k canbemuchhigherthan γ k due to the delay
in
| X (n 1 )
2 as given in equation (10.4). Thus k becomes too low, according to
the second property above, and, consequently, may become lower than the
threshold of VAD. On the other hand, the delay rarely causes a problem at the
onset regions, according to the first property above, as γ (n)
k
|
k
in equation (10.3)
is usually large enough.
It is possible to consider an adaptive weighting factor in the estimation
of the apriori SNR in equation (10.4). In other words, a lower α can be
assigned for the active region, and a higher α for the inactive region. When
alow α is assigned at the offset region, it reduces the effect of the delay in
equation (10.4), producing a lower ξ k , and therefore may prevent the abrupt
decay of k . However, it is not easy to design a generalized adaptive rule
that will result in good performance over various kinds of speech and noise
signals. Instead, Cho [16, 17] has suggested a smoothed likelihood ratio (SLR)
(n)
k
which is defined as
exp κ log (n 1 )
(n)
κ) log (n)
=
+
( 1
(10.6)
k
k
k
where κ is a smoothing factor and (n k is defined in equation (10.3) for the n th
frame. The decision of the voice activity is finally carried out by computing,
exp 1
K
K
log (n)
k
(n)
=
(10.7)
=
k
1
and comparing it against a threshold. An n th
input frame is classified
as voice-active if (n)
is greater than a threshold and voice-inactive
otherwise.
Examples of the LR and the SLR over a segment of speech are shown in
Figure 10.15. The SLR seems to overcome the problem outlined for the LR. As
shown in Figure 10.15b, the SLR is relatively higher than the LR at the offset
regions. The comparison over inactive frames is also shown in Figure 10.15c,
which indicates that the SLR fluctuates less than the LR.
Search WWH ::




Custom Search