Databases Reference
In-Depth Information
is close to the speech sequence y n . Because this is a harmonic approximation, the approximate
sequence
when the segment of
speech being encoded is unvoiced. Therefore, this difference can be used to decide whether
the frame or some subset of it is unvoiced.
The two most popular sinusoidal coding techniques today are represented by the sinusoidal
transform coder (STC) [ 241 ] and the multiband excitation coder (MBE) [ 242 ]. While the STC
and MBE are similar in many respects, they differ in how they handle unvoiced speech. In the
MBE coder, the frequency range is divided into bands, each consisting of several harmonics
of the fundamental frequency
y n }
will be most different from the speech sequence
{
y n }
ω 0 . Each band is checked to see if it is unvoiced or voiced. The
voiced bands are synthesized using a sum of sinusoids, while the unvoiced bands are obtained
using a random number generator. The voiced and unvoiced bands are synthesized separately
and then added together.
In the STC, the proportion of the frame that contains a voiced signal is measured using
a “voicing probability” P
. The voicing probability is a function of how well the harmonic
model matches the speech segment. Where the harmonic model is close to the speech signal,
the voicing probability is taken to be unity. The sine wave frequencies are then generated by
v
k
w 0
w 0 w c P v
for k
w k =
(24)
k w 0 + (
k )w u for k
w 0 >w c P v
k
where
w c corresponds to the cutoff frequency (4kHz),
w u is the unvoiced pitch corresponding
to 100Hz, and k
is the largest value of k for which k w 0 w c P v . The speech is then
synthesized as
K
A
y n =
ˆ
(w k )
cos
(
n
w k + φ k )
(25)
k =
1
Both the STC and the MBE coders have been shown to perform well at low rates. A version of
the MBE coder known as the improved MBE (IMBE) coder was approved by the Association
of Police Communications Officers (APCO) as the standard for law enforcement.
18.3.5 Mixed Excitation Linear Prediction (MELP)
The mixed excitation linear prediction (MELP) coder was selected to be the federal standard for
speech coding at 2.4kbps by the Defense Department Voice Processing Consortium (DDVPC).
The MELP algorithm uses the same LPC filter to model the vocal tract. However, it uses a
much more complex approach to the generation of the excitation signal.
A block diagram of the decoder for the MELP system is shown in Figure 18.9 . As evident
from the figure, the excitation signal for the synthesis filter is no longer simply noise or a
periodic pulse but a multiband mixed excitation. The mixed excitation contains both a filtered
signal from a noise generator as well as a contribution that depends directly on the input signal.
The first step in constructing the excitation signal is pitch extraction. The MELP algorithm
obtains the pitch period using a multistep approach. In the first step an integer pitch value P 1
is obtained in the following manner:
Search WWH ::




Custom Search