Graphics Reference
In-Depth Information
Fig. 4 Illustration of proposed segmentation approach
to increase the number of training samples, but it is not desirable in our case since
the large increase in the number of training samples caused by splitting an utterance
into segments is already a great computational burden. However, a more ef ! cient
way is to ! find more informative segments by minimizing the amount of mutual
information between the two feature vectors. In this study, ! fixed length segments
are constructed at selected positions on the basis of designed indexes. More precise
labels of segments can be de ! ned when taking into consideration a much smaller
number of selected segments. Thus, not all parts of the utterance are used in the
analysis. A sliding window with no overlap is adopted to process the utterance
signal for calculating the ranking of the ! fixed length segment. A correlation coef-
! ficient is adopted for getting a smaller ! fixed number of segments from an utterance.
The correlation coef ! cient [ 18 ] , which is also known as the Pearson product-
moment correlation coef ! cient, is a measure of the linear dependence between two
feature vectors. It is de ! ned as
P
x X
Þ y Y
ð
ð
Þ
x 2 X ; y 2 Y
q
c ¼
!!!!!!!!!!!!!!!!!!!!!!!!!!!
ð 1 Þ
p
!!!!!!!!!!!!!!!!!!!!!!!!!!!
P
P
x X
y Y
ð
Þ
ð
Þ
x 2 X
y 2 Y
X
X ¼ 1
n
x
ð 2 Þ
x 2 X
X
Y ¼ 1
n
y ;
ð 3 Þ
y 2 Y
where n is the number of features.
The concept of the proposed segmentation methods is illustrated in Fig. 5 .
3.2 Feature Extraction
We focused on a set of 162 acoustic features obtained from speech segments,
including 50 mel-frequency spectral coef ! cients (MFCC), 50 linear predictive
coef ! cients (LPC), and 10 statistical features (mode, median, mean, range, inter-
quartile range, standard deviation, variation, absolute deviation, skewness, and
Kurtosis) calculated from each of the ! five levels of detailed wavelet coef ! cients by
using the discrete wavelet decomposition (DWT), pitch, energy, zero-crossing rate
(ZCR), the ! rst seven formants, centroid, and 95%-roll-off-point fromFFT-spectrum.
 
Search WWH ::




Custom Search