Digital Signal Processing Reference
In-Depth Information
The GMM with the number of Q assumes that the probability distribution of the
observed parameters takes the followings parametric form [2][3]:
Q
Q
(
)
()
PX
=
α
Nx
,
μ
,
Σ
,
α
=
,
α
1
(1)
q
q
q
q
q
q
=
1
q
=
1
(
)
Where
α
is the weighted coefficient of the number q and
Nx
,
μ
, qq
Σ
denotes the
q
p-dimensional normal distribution with mean vector
μ
and covariance matrix
Σ
q
q
defined by:
1
1
Nx
( ;
μ
,
Σ=
)
exp(
(
x
μ
)
T
1
(
x
μ
))
(2)
2
p
2
(2
π
)
Σ
αμΣ
, ,
qqq
The parameters of the GMM such as
can be estimated with the
expectation-maximization (EM) algorithm.
In the conversion stage, the parameters of the conversion function are estimated by
the joint density of source and target features. A joint vector
TTT
=
where X and Y are the aligned source and target feature, is used to estimate
GMM parameters. The following form is assumed for the conversion function:
ZXY
[
]
Q
Fx
()
=
(
μ
Y
+ Σ
YX
(
Σ
XX
) (
1
x
μ
X
)) ( | )
pc x
(3)
q
q
q
q
q
q
=
1
pc x is the conditional probability that a given observation vector x
belongs to the acoustic class
(|
)
Where
q
c of the GMM.
αμ
Nx
(;
XX
,
)
q
q
q
pc x
(|
)
=
(4)
q
Q
XX
αμ
Nx
(;
,
)
q
q
q
q
=
1
X
XX
XY
  
μ
ΣΣ
q
q
q
μ
=
  
ΣΣ
;
Σ
=
(5)
q
q
Y
YX
YY
μ
  
  
q
q
q
Experiments of above-mentioned algorithms are simulated. Choose one group of male
voices and one group of female voices as experimental data to show and analyze.
Figure 3 shows the converted envelope by GMM.
From Figure 3, we can see that the converted LPC is like the target LPC; it shows
that the spectral envelope conversion with GMM is feasible.
3.2
Residual Conversion
The excitation source of speech contains a large number of speaker identification
information, which can reflect speaker features. Transforming excitation source can
improve quality of converted voice. Residual signal of target speaker is mainly used
in residual prediction. The conversion system adopt LPC synthesis model [3][4].
 
Search WWH ::




Custom Search