Databases Reference
In-Depth Information
The normalized autocorrelation at the fractional pitch values is given by
( 1 ) c T ( 0 , T ) + c T ( 0 , T + 1 )
r ( T + ) =
c T ( 0 , 0 ) [ ( 1 )
2 c T ( T + 1 , T + 1 ) (27)
The fractional estimate that gives the higher autocorrelation is selected as the refined pitch
value P 2 .
The final refinements of the pitch value are obtained using the linear prediction residuals.
The residual sequence is generated by filtering the input speech signal with the filter obtained
using LPC analysis. For the purposes of pitch refinement the residual signal is filtered using
a low-pass filter with a cutoff of 1kHz. The normalized autocorrelation function is computed
for this filtered residual signal for lags from five samples less to five samples more than the
candidate P 2 value, and a candidate value of P 3 is obtained. If r
2 c T ( T , T ) + 2 ( 1 ) c T ( T , T + 1 ) +
(
P 3 )
0
.
6, we check to make
sure that P 3 is not a multiple of the actual pitch. If r
6, we do another fractional pitch
refinement around P 3 using the input speech signal. If in the end r
(
P 3 )<
0
.
55, we replace P 3
with a long-term average value of the pitch. The final pitch value is quantized on a logarithmic
scale using a 99-level uniform quantizer.
The input is also subjected to a multiband voicing analysis using five filters with passbands
0-500, 500-1000, 1000-2000, 2000-3000, and 3000-4000Hz. The goal of the analysis is to
obtain the voicing strengths Vbp i for each band used in the shaping filters. Noting that P 2
was obtained using the output of the lowest band filter, r
(
P 3 )<
0
.
is assigned as the lowest band
voicing strength Vbp 1 . For the other bands, Vbp i is the larger of r
(
P 2 )
for that band and
the correlation of the envelope of the band-pass signal. If the value of Vbp 1 is small, this
indicates a lack of low-frequency structure, which in turn indicates an unvoiced or transition
input. Thus, if Vbp 1
(
P 2 )
<
.
5, the pulse component of the excitation signal is selected to be
aperiodic, and this decision is communicated to the decoder by setting the aperiodic flag to 1.
When Vbp 1 >
0
6, the values of the other voicing strengths are quantized to 1 if their value is
greater than 0.6, and to 0 otherwise. In this way signal energy in the different bands is turned
on or off depending on the voicing strength. There are several exceptions to this quantization
rule. If Vbp 2 , Vbp 3 , and Vbp 4 all have magnitudes less than 0.6 and Vbp 5 has a value greater
than 0.6, they are all (including Vbp 5 ) quantized to 0. Also, if the residual signal d n means
d sub n in math mode contains a few large values, indicating sudden transitions in the input
signal, the voicing strengths are adjusted. In particular, the peakiness is defined as
0
.
1
160 160
1 d n
n
=
peakiness
=
(28)
160 160
1
1 |
d n |
n
=
If this value exceeds 1.34, Vbp 1 is forced to 1. If the peakiness value exceeds 1.6, Vbp 1 ,
Vbp 2 , and Vbp 3 areallsetto1.
In order to generate the pulse input, the algorithm measures the magnitude of the discrete
Fourier transform coefficients corresponding to the first 10 harmonics of the pitch. The predic-
tion residual is generated using the quantized predictor coefficients. The algorithm searches
in a window of width
/ P 3
samples around the initial estimates of the pitch harmonics
for the actual harmonics where P 3 is the quantized value of P 3 . The magnitudes of the har-
monics are quantized using a vector quantizer with a codebook size of 256. The codebook is
512
 
Search WWH ::




Custom Search