DSP Applications and Student Projects - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

Input speech

analog

Sampling

digital

Framing/

blocking

Windowing

Code Word

FFT

(Converstion to

frequency

domain)

Computing

code vector

using VQ

Computing mel

frequency

coefficients

FIGURE 10.53. Steps for speaker recognition implementation.

(c) Use FFT to convert each frame from time to frequency domain.

(d) Convert the resulting spectrum into a Mel-frequency scale.

(e) Convert the Mel spectrum back to the time domain.

2. Classification consists of models for each speaker and a decision logic neces-

sary to render a decision. This module classifies extracted features according

to the individual speakers whose voices have been stored. The recorded voice

patterns of the speakers are used to derive a classification algorithm. Vector

quantization (VQ) is used. This is a process of mapping vectors from a large

vector space to a finite number of regions in that space. Each region is called

a cluster and can be represented by its center, called a codeword . The collec-

tion of all clusters is a codebook . In the training phase, a speaker-specific VQ

codebook is generated for each known speaker by clustering his/her training

acoustic vectors. The distance from a vector to the closest codeword of a code-

book is called a VQ distortion . In the recognition phase, an input utterance of

an unknown voice is vector-quantized using each trained codebook, and the

total VQ distortion is computed. The speaker corresponding to the VQ code-

book with the smallest total distortion is identified.

Speaker recognition can be classified with identification and verification. Speaker

identification is the process of determining which registered speaker provides a

given utterance. Speaker verification is the process of accepting or rejecting the iden-

tity claim of a speaker. This project implements only the speaker identification (ID)

process. The speaker ID process can be further subdivided into closed set and open

set . The closed set speaker ID problem refers to a case where the speaker is known

a priori to belong to a set of M speakers. In the open set case, the speaker may be

out of the set and, hence, a “none of the above” category is necessary. In this project,

only the simpler closed set speaker ID is used.

Speaker ID systems can be either text-independent or text-dependent . In the text-

independent case, there is no restriction on the sentence or phrase to be spoken,

whereas in the text-dependent case, the input sentence or phrase is indexed for each

Digital Signal Processing and Applications with the C6713 and C6416 DSK

Search WWH ::

Custom Search

Home