Analysis/Synthesis and Analysis by Synthesis Schemes - Introduction to Data Compression

Databases Reference

In-Depth Information

The normalized autocorrelation at the fractional pitch values is given by

( 1 − ) c T ( 0 , T ) + c T ( 0 , T + 1 )

r ( T + ) =

c T ( 0 , 0 ) [ ( 1 − )

2 c T ( T + 1 , T + 1 ) (27)

The fractional estimate that gives the higher autocorrelation is selected as the refined pitch

value P 2 .

The final refinements of the pitch value are obtained using the linear prediction residuals.

The residual sequence is generated by filtering the input speech signal with the filter obtained

using LPC analysis. For the purposes of pitch refinement the residual signal is filtered using

a low-pass filter with a cutoff of 1kHz. The normalized autocorrelation function is computed

for this filtered residual signal for lags from five samples less to five samples more than the

candidate P 2 value, and a candidate value of P 3 is obtained. If r

2 c T ( T , T ) + 2 ( 1 − ) c T ( T , T + 1 ) +

(

P 3 )

6, we check to make

sure that P 3 is not a multiple of the actual pitch. If r

6, we do another fractional pitch

refinement around P 3 using the input speech signal. If in the end r

(

P 3 )<

55, we replace P 3

with a long-term average value of the pitch. The final pitch value is quantized on a logarithmic

scale using a 99-level uniform quantizer.

The input is also subjected to a multiband voicing analysis using five filters with passbands

0-500, 500-1000, 1000-2000, 2000-3000, and 3000-4000Hz. The goal of the analysis is to

obtain the voicing strengths Vbp i for each band used in the shaping filters. Noting that P 2

was obtained using the output of the lowest band filter, r

(

P 3 )<

is assigned as the lowest band

voicing strength Vbp 1 . For the other bands, Vbp i is the larger of r

(

P 2 )

for that band and

the correlation of the envelope of the band-pass signal. If the value of Vbp 1 is small, this

indicates a lack of low-frequency structure, which in turn indicates an unvoiced or transition

input. Thus, if Vbp 1

(

P 2 )

5, the pulse component of the excitation signal is selected to be

aperiodic, and this decision is communicated to the decoder by setting the aperiodic flag to 1.

When Vbp 1 >

6, the values of the other voicing strengths are quantized to 1 if their value is

greater than 0.6, and to 0 otherwise. In this way signal energy in the different bands is turned

on or off depending on the voicing strength. There are several exceptions to this quantization

rule. If Vbp 2 , Vbp 3 , and Vbp 4 all have magnitudes less than 0.6 and Vbp 5 has a value greater

than 0.6, they are all (including Vbp 5 ) quantized to 0. Also, if the residual signal d n means

d sub n in math mode contains a few large values, indicating sudden transitions in the input

signal, the voicing strengths are adjusted. In particular, the peakiness is defined as

160 160

1 d n

peakiness

(28)

160 160

1 |

d n |

If this value exceeds 1.34, Vbp 1 is forced to 1. If the peakiness value exceeds 1.6, Vbp 1 ,

Vbp 2 , and Vbp 3 areallsetto1.

In order to generate the pulse input, the algorithm measures the magnitude of the discrete

Fourier transform coefficients corresponding to the first 10 harmonics of the pitch. The predic-

tion residual is generated using the quantized predictor coefficients. The algorithm searches

in a window of width

/ P 3

samples around the initial estimates of the pitch harmonics

for the actual harmonics where P 3 is the quantized value of P 3 . The magnitudes of the har-

monics are quantized using a vector quantizer with a codebook size of 256. The codebook is

512

Introduction to Data Compression

Search WWH ::

Custom Search

Home