Harmonic Speech Coding - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

band excitation (MBE) coders used a constant threshold for all the bands.

However the most recent versions use several heuristic rules to obtain a

better performance [7], e.g. as the frequency increases the threshold function

is decreased, if the same band of the previous frame was unvoiced, if the

high-frequency energy exceeds the low-frequency energy, and if the speech

energy approaches the energy of the background noise.

Sinusoidal Model Approach

McAulay et al . proposed a different voicing determination technique for his

sinusoidal transform coder (STC) [2]. The speech spectrum is divided into

two bands, determined by a voicing transition frequency above which the

spectrum is declared unvoiced. This method estimates the similarity between

the harmonically-synthesized signal,

ˆ

s(n, ω 0 ) , and the original speech signal

s(n) . The signal to noise ratio (SNR), δ , between s(n) and

ˆ

s(n, ω 0 ) is given by,

−

N

1

s 2 (n)

n

=

0

=

δ

(8.2)

N

−

1

s (n) − ˆ

s (n, ω 0 ) 2

=

n

0

where N is the analysis frame length and

ˆ

s(n, ω 0 ) is given by

K(ω 0 )

A l exp jnlω 0

jθ l

ˆ

s (n, ω 0 )

=

+

(8.3)

l

=

1

where the harmonic amplitudes, A l , are obtained from the spectral envelope

and θ l are the harmonic phases. McAulay simplified equation (8.2) for reduced

computational complexity, and the simplified δ is given by,

L

A l

l

=

1

δ =

(8.4)

L

A l

−

2 Nρ (ω 0 )

=

l

1

where A l are the harmonic-frequency spectral amplitudes of the original

signal as shown below,

L

A l exp jnω l

jφ l

s (n)

=

+

(8.5)

l

=

1

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home