Databases Reference
In-Depth Information
Voice
activity
LTP
scaling
control
detector
Pitch
Gains
analysis
proc.
LSF
Quant.
Noise
shaping
analysis
Prediction
analysis
High-
pass
filter
Noise
shaping
quantizer
Prefilter
F I GU R E 18 . 12
Steps in the encoding of speech for the SILK coder.
The SILK encoder can operate in one of four modes, a narrowband mode that accepts
inputs sampled at 8 kHZ, a medium-band mode that supports inputs at 8 kHz and 12 kHz, a
wideband mode that also supports inputs sampled at 16 kHz, and a super wideband mode that
can support inputs sampled at 8kHz, 12 kHz, 16kHz, and 24 kHz. The encoder can accept
inputs at all sampling rates and if necessary resample the input to a lower sampling rate. This
ability to dynamically change the sampling rate of the input allows the encoder to adapt to
changing channel conditions. The input is divided into 20 millisecond frames and the encoder
can package one to five frames into a single packet. Including more frames in a packet lowers
the overhead at the cost of increasing the latency. This is one of the many ways the SILK coder
responds to network conditions.
The voice activity detector decomposes the input into four equal bandwidth subbands with
additional differentiation in the lowest subband to estimate the level of speech activity. The
coder has the option of drastically reducing transmission during silence intervals and increased
background noise.
The signal is also filtered using a high-pass filter with a 70 Hz cutoff to get rid of any DC
biases and 50- or 60-cycle hum. The filter signal is then provided to the pitch analysis block,
which generates pitch lags every five milliseconds. The pitch analysis block performs an LPC
analysis of the input and uses the coefficients in a whitening filter. The number of coefficients
in the whitening filter can be 16, 12, or 8 depending on the complexity setting. Correlation of
downsampled versions of the whitened sequence are used to estimate the speech type and the
pitch.
Noise shaping analysis is used to generate the coefficients for the prefilter. The idea behind
noise shaping is to shape the quantization noise spectrum in such a way that the highest amount
Search WWH ::




Custom Search