Design and Implementation of Voice Conversion System Based on GMM and ANN - Multimedia and Signal Processing

Digital Signal Processing Reference

In-Depth Information

information. However, there are few researches about rhythm conversion at present.

And its research focuses generally on the transformation of pitch.

This article tries to train the spectral envelope and its residual using GMM and

simulates the prosodic features of target speech using a BP network. Then,

transforming spectrum characteristics and prosodic features of source speaker's

speech according to the mapping rules, and speech synthesis has target characteristics.

The algorithm simulation and the system performance evaluating of voice conversion

is completed. The theoretic analysis and computer simulation results reveal that the

method and the system of voice conversion are effective.

2

System Design of Voice Conversion

Generally, the procedure of voice conversion mainly contains training stage and

conversion stage. In the training stage, the source speaker's speech and the target

speaker's speech are trained respectively in order to estimate mapping rules. The

relationship between source and target of model parameter can also be obtained. In the

conversion stage, source speech is transformed by the mapping rules.

Based on the LPC algorithm, the converted voice tends to target voice by adjusting

suprasegmental features. The paper divides the algorithm into three modules.

Transforming the spectral envelope and its residual is introduced in the first and second

module. Module 3 describes the conversion of suprasegmental features. In the

conversion of spectral envelope and its residual signal, the GMM based methods is

more succeed in preventing the spectral envelope from over-smoothing than another

methods to some extent. The conversion of residual completely keeps target speaker's

information of spectra excitation. The BP algorithm is performed for pitch frequencies

transformation. We could convert most personality of source speaker to target speaker.

Because of unvoiced sound contains less speaker's information, the voice conversion

system copies the voiceless sound directly and transforms frames of voiced sound.

Figure 1 and Figure 2 separately show the diagram of training and conversion

combined with GMM and ANN. To test the validity of the algorithms, the tests of

Chinese speaker's speech are given. The experiments are divided into four groups:

male to male, female to female, male to female and female to male. The order of LPC

is 12 and the number of GMM is 64.

3

Principle and Implementation of Voice Conversion Algorithm

3.1

Spectral Envelope Conversion

LSF has better interpolation property compare to LPC, which can transform each

other. Therefore, we can achieve the conversion of speech spectral envelope by

transforming LSF.

In the training stage, the article tries to find conversion functions by dealing with

source and target speech, such as preemphasis, removing background noise, getting

frames of speech, obtaining LPC by autocorrelation function. And LPC is transformed

into LSF. At last, LSF parameters are trained by GMM.

Multimedia and Signal Processing

Search WWH ::

Custom Search

Home