Digital Signal Processing Reference
In-Depth Information
25 dimensional feature vectors is slightly better than the feature vectors with other
dimensions. Here, the dimension 25 for pitch and energy contours is not crucial. The
reduced size of the pitch and energy contours has to be chosen so that the dynamics
of the original contours are retained in their resampled versions. The basic reasons
for reducing the dimensionality of the original pitch and energy contours are (1) the
need for the fixed dimensional input feature vectors for developing the SVMmodels
and (2) the number of feature vectors required for training the classifier has to be
proportional to the size of the feature vector to avoid the curse of dimensionality
(The need of number of feature vectors grows exponentially as the dimensionality
of feature vector increases. Therefore always there should be a proportion between
the number of available feature vectors and their dimensionality). The local duration
pattern is represented by the sequence of normalized syllable durations. Here the
syllable durations are determined using the time interval between successive VOPs
[ 2 ]. The length of duration contour is proportional to the number of syllables present
in the sentence, which leads to feature vectors of unequal lengths. To obtain the
feature vectors of equal length, the length of duration vector is fixed to be 18 (the
maximum number of syllables present in the longest utterance of IITKGP-SESC).
The length for shorter utterances is compensated by zero padding.
3.4.2 Word and Syllable Level Features
The global and local prosodic features extracted from words and syllables help to
analyze the contribution of different segments (sentences, words, and syllables) and
their positions (initial, middle, and final), in the utterance toward emotion recognition.
Word and syllable boundaries are determined automatically, using vowel onset points
[ 3 , 4 ]. Before extracting the features, the words in all the utterances of the database
are divided into three groups namely initial, middle, and final words. Similarly, the
syllables within each group of words are also classified as initial, middle, and final
syllables. While categorizing the words, the length of the words and number of words
in an utterance are taken into consideration. Length of words is measured in terms of
number of syllables. If there are more than 3 words in the utterance and the first word
is monosyllabic, then the first 2 words are grouped as initial words. This is because
monosyllabic words may not be sufficient to capture emotion specific information.
Many times monosyllabic words are not sufficient for the speaker to clearly express
specific emotion. The scheme of grouping of words and syllables into the above
mentioned three groups is given in Table 3.5 . This table contains word and syllable
grouping details for the 15 sentences of IITKGP-SESC. For instance, grouping of
words in the case of (S1, S8, S9) and (S5, S11) is straight forward as there are either
3 or 6 words in the sentences. In the case of S2, out of 5, two words are grouped as
the initial words, as the first word of the sentence is monosyllabic in nature. The last
word, which contains 4 syllables, is treated as the final word, and the remaining two
words are considered as the middle words. Similarly in the case of S3, the first word
is considered as the initial word, as it contains 4 syllables. On the basis of production
 
Search WWH ::




Custom Search