Databases Reference
In-Depth Information
entry differs from the preceding entry in only two places. This structure helps reduce the
search complexity.
The adaptive codebook consists of the excitation vectors from the previous frame. Each
time a new excitation vector is obtained, it is added to the codebook.
In this manner, the
codebook adapts to local statistics.
The FS 1016 coder has been shown to provide excellent reproductions in both quiet and
noisy environments at rates of 4.8kbps and above [ 239 ]. Because of the richness of the
excitation signals, the reproduction does not suffer from the problem of sounding artificial.
The lack of a voicing decision makes it more robust to background noise. The quality of the
reproduction of this coder at 4.8kbps has been shown to be equivalent to a delta modulator
operating at 32kbps [ 239 ]. The price for this quality is much higher complexity and a much
longer coding delay. We will address this last point in the next section.
CCITT G.728 Speech Standard
By their nature, the schemes described in this chapter have some coding delay built into them.
By “coding delay,” we mean the time between when a speech sample is encoded to when
it is decoded if the encoder and decoder were connected back-to-back (i.e., there were no
transmission delays). In the schemes we have studied, a segment of speech is first stored in a
buffer. We do not start extracting the various parameters until a complete segment of speech
is available to us. Once the segment is completely available, it is processed. If the processing
is real time, this means another segment's worth of delay. Finally, once the parameters have
been obtained, coded, and transmitted, the receiver has to wait until at least a significant part
of the information is available before it can start decoding the first sample. Therefore, if a
segment contains 20 milliseconds' worth of data, the coding delay would be approximately
somewhere between 40 to 60 milliseconds. This kind of delay may be acceptable for some
applications; however, there are other applications where such long delays are not acceptable.
For example, in some situations there are several intermediate tandem connections between the
initial transmitter and the final receiver. In such situations, the total delay would be a multiple
of the coding delay of a single connection. The size of the delay would depend on the number
of tandem connections and could rapidly become quite large.
For such applications, CCITT approved recommendation G.728, a CELP coder with a
coder delay of 2 milliseconds operating at 16kbps. As the input speech is sampled at 8000
samples per second, this rate corresponds to an average rate of 2 bits per sample.
In order to lower the coding delay, the size of each segment has to be reduced significantly
because the coding delay will be some multiple of the size of the segment. The G.728 rec-
ommendation uses a segment size of five samples. With five samples and a rate of 2 bits per
sample, we only have 10 bits available to us. Using only 10 bits, it would be impossible to
encode the parameters of the vocal tract filter as well as the excitation vector. Therefore, the
algorithm obtains the vocal tract filter parameters in a backward adaptive manner; that is, the
vocal tract filter coefficients used to synthesize the current segment are obtained by analyzing
the previous decoded segments. The CCITT requirements for G.728 included the requirement
that the algorithm operate under noisy channel conditions. It would be extremely difficult
to extract the pitch period from speech corrupted by channel errors. Therefore, the G.728
algorithm does away with the pitch filter. Instead, the algorithm uses a 50th-order vocal tract
Search WWH ::




Custom Search