Digital Signal Processing Reference
In-Depth Information
usually carried out on the perceptually-weighted original speech to
obtain a good idea of the likely pitch period before a closed-loop search
is applied around this value. The pitch contribution is then subtracted
from the reference signal to update it for the next stage (the codebook
search). Since the pitch can change up to 1%/ms, the pitch delay is
updated more frequently than the LPC for accurate voice periodicity
generation in the synthesized speech.
(c) Once the parameters of the two synthesis filters are found, the excitation
is determined. Each codebook vector is passed through thememoryless
LPC and perceptual weighting filters and the codebook vector which
gives the minimum squared difference between the output it produces
and the reference signal is selected and its corresponding scaling factor
is computed. Note that if the delay D in the pitch filter is greater
than the subframe size, it will not affect the synthesized codebook
vector. In addition, the pitch filter is usually implemented as an
adaptive codebook operating in parallel with the stochastic codebook
and, hence, its response is eliminated from the stochastic codebook
search loop.
(d) Finally the initial conditions (i.e. the memory) of the filters are restored,
and the synthetic speech is generated by filtering the scaled optimum
codebook sequence through the filters so as to update the filters for
processing the next subframe.
3. In the synthesizer (decoder), the initial conditions (i.e. the memory) of
the filters are restored and the synthetic speech is generated by filtering
the scaled optimum codebook sequence through the filters without any
perceptual weighting.
From the above description it is clear that the computation can be broken
down into three blocks: LPC analysis to compute the LPC parameters; pitch
analysis to compute the long-term predictor parameters; and a codebook
search to determine the shape and gain of the excitation vector.
7.3.1 LPCPrediction
The role of LPC prediction is to represent the general shape of the speech
spectrum. Therefore, in the CELP synthesizer, the (ideally flat) excitation is
shaped by the spectral envelope of the LPC filter. The LPC parameters can
be computed by a number of methods as discussed in Chapter 4. However,
most CELP coders use a 10 th -order LPC filter based on autocorrelation
estimation. The speech signal, which is usually 20ms long, is passed through
aHammingwindowwhich is usually placed half a frame ahead so as to enable
accurate parameter interpolation for each subframe. However, many delay-
sensitive applications and standards use an asymmetric window to give more
weighting to the latest samples contained in the analysis frame. The delay
Search WWH ::




Custom Search