Analysis/Synthesis and Analysis by Synthesis Schemes - Introduction to Data Compression

Databases Reference

In-Depth Information

is close to the speech sequence y n . Because this is a harmonic approximation, the approximate

sequence

when the segment of

speech being encoded is unvoiced. Therefore, this difference can be used to decide whether

the frame or some subset of it is unvoiced.

The two most popular sinusoidal coding techniques today are represented by the sinusoidal

transform coder (STC) [ 241 ] and the multiband excitation coder (MBE) [ 242 ]. While the STC

and MBE are similar in many respects, they differ in how they handle unvoiced speech. In the

MBE coder, the frequency range is divided into bands, each consisting of several harmonics

of the fundamental frequency

{ˆ

y n }

will be most different from the speech sequence

{

y n }

ω 0 . Each band is checked to see if it is unvoiced or voiced. The

voiced bands are synthesized using a sum of sinusoids, while the unvoiced bands are obtained

using a random number generator. The voiced and unvoiced bands are synthesized separately

and then added together.

In the STC, the proportion of the frame that contains a voiced signal is measured using

a “voicing probability” P

. The voicing probability is a function of how well the harmonic

model matches the speech segment. Where the harmonic model is close to the speech signal,

the voicing probability is taken to be unity. The sine wave frequencies are then generated by

w 0

w 0 w c P v

for k

w k =

(24)

k ∗ w 0 + (

k ∗ )w u for k

−

w 0 >w c P v

where

w c corresponds to the cutoff frequency (4kHz),

w u is the unvoiced pitch corresponding

to 100Hz, and k ∗

is the largest value of k for which k ∗ w 0 w c P v . The speech is then

synthesized as

y n =

(w k )

cos

(

w k + φ k )

(25)

k =

Both the STC and the MBE coders have been shown to perform well at low rates. A version of

the MBE coder known as the improved MBE (IMBE) coder was approved by the Association

of Police Communications Officers (APCO) as the standard for law enforcement.

18.3.5 Mixed Excitation Linear Prediction (MELP)

The mixed excitation linear prediction (MELP) coder was selected to be the federal standard for

speech coding at 2.4kbps by the Defense Department Voice Processing Consortium (DDVPC).

The MELP algorithm uses the same LPC filter to model the vocal tract. However, it uses a

much more complex approach to the generation of the excitation signal.

A block diagram of the decoder for the MELP system is shown in Figure 18.9 . As evident

from the figure, the excitation signal for the synthesis filter is no longer simply noise or a

periodic pulse but a multiband mixed excitation. The mixed excitation contains both a filtered

signal from a noise generator as well as a contribution that depends directly on the input signal.

The first step in constructing the excitation signal is pitch extraction. The MELP algorithm

obtains the pitch period using a multistep approach. In the first step an integer pitch value P 1

is obtained in the following manner:

Introduction to Data Compression

Search WWH ::

Custom Search

Home