Digital Signal Processing Reference
In-Depth Information
information. However, there are few researches about rhythm conversion at present.
And its research focuses generally on the transformation of pitch.
This article tries to train the spectral envelope and its residual using GMM and
simulates the prosodic features of target speech using a BP network. Then,
transforming spectrum characteristics and prosodic features of source speaker's
speech according to the mapping rules, and speech synthesis has target characteristics.
The algorithm simulation and the system performance evaluating of voice conversion
is completed. The theoretic analysis and computer simulation results reveal that the
method and the system of voice conversion are effective.
2
System Design of Voice Conversion
Generally, the procedure of voice conversion mainly contains training stage and
conversion stage. In the training stage, the source speaker's speech and the target
speaker's speech are trained respectively in order to estimate mapping rules. The
relationship between source and target of model parameter can also be obtained. In the
conversion stage, source speech is transformed by the mapping rules.
Based on the LPC algorithm, the converted voice tends to target voice by adjusting
suprasegmental features. The paper divides the algorithm into three modules.
Transforming the spectral envelope and its residual is introduced in the first and second
module. Module 3 describes the conversion of suprasegmental features. In the
conversion of spectral envelope and its residual signal, the GMM based methods is
more succeed in preventing the spectral envelope from over-smoothing than another
methods to some extent. The conversion of residual completely keeps target speaker's
information of spectra excitation. The BP algorithm is performed for pitch frequencies
transformation. We could convert most personality of source speaker to target speaker.
Because of unvoiced sound contains less speaker's information, the voice conversion
system copies the voiceless sound directly and transforms frames of voiced sound.
Figure 1 and Figure 2 separately show the diagram of training and conversion
combined with GMM and ANN. To test the validity of the algorithms, the tests of
Chinese speaker's speech are given. The experiments are divided into four groups:
male to male, female to female, male to female and female to male. The order of LPC
is 12 and the number of GMM is 64.
3
Principle and Implementation of Voice Conversion Algorithm
3.1
Spectral Envelope Conversion
LSF has better interpolation property compare to LPC, which can transform each
other. Therefore, we can achieve the conversion of speech spectral envelope by
transforming LSF.
In the training stage, the article tries to find conversion functions by dealing with
source and target speech, such as preemphasis, removing background noise, getting
frames of speech, obtaining LPC by autocorrelation function. And LPC is transformed
into LSF. At last, LSF parameters are trained by GMM.
Search WWH ::




Custom Search