Digital Signal Processing Reference
In-Depth Information
all further analysis if some should be deleted. As for insertions and substitutions,
these are only critical if they change the 'tone' of the content.
For the alternative processing of written text, some text pre-processing will usually
be needed. First, delimiters such as punctuation can be used for segmentation. Then,
capital letters are often de-capitalised to avoid double entries for same words. Finally,
it may be reasonable to allow for some word replacement rules or calculation of edit
distance between written words and their counterparts in the vocabulary. This may
cover misspelling of words or varieties such as in British English, American English,
or Australian English (e.g., [ 68 ]).
We will next look at different methods for generating linguistic features.
6.3.1 Bag of Words
The basic idea behind Bag of Words (BoW) is the representation of symbolic infor-
mation in a numeric feature space. Each feature thereby represents the occurrence
of a specific 'word', i.e., symbolic entity, in the string of analysis. BoW, originally
developed for document retrieval [ 69 ], was successfully applied to the fields of emo-
tion [ 57 ] and interest (cf. [ 70 , 71 ]) recognition from text and speech. BoW became a
popular approach for these fields [ 62 , 72 ]. The recognition is often based on speech
turns or larger segments, such as paragraphs or the entire lyrics of a song. Every
such sequence
S
can be described by the set of its contained word entities w i , i.e.,
S = { w 1 ,...,
w S }
= | S |
is the sequence length. The BoW method consid-
ers these words w i as units of interest. For a given training set
, where S
L
, all different words
build the word inventory—the 'vocabulary'
being
the size of this vocabulary. Particularly in spoken or sung language analysis, also
non-linguistic vocalisations like sighs and yawns [ 73 ], laughs [ 74 , 75 ], cries [ 76 ],
and coughs [ 77 ] can be integrated into such a vocabulary [ 62 , 70 ] in speech [ 78 ]or
singing decoding.
For each word w i with i
V = {
w 1 ,...,
w V }
, with V
= | V |
in the vocabulary a corresponding feature
x i is created. This may easily lead to a high dimensional feature vector space. Each
sequence
∈ {
1
,...,
V
}
S j can then be mapped to a vector x j in this feature space. Ways to determine
the value of each feature x i first include counting the number of occurrences of a
word w i in the sentence
S j , resulting in the word frequency f i , j . As a simplification,
the binary general occurrence (or non-occurrence) can be used. The 'term frequency'
can also be transformed in other ways (cf.[ 69 ]), for example by application of the
logarithm—the term frequency transformation (TF):
log c
f i , j ,
TF i , j =
+
(6.80)
where the offset parameter c prevents definition problems in case of f i , j
=
0. It is
often set to c
=
1. Another measure is the inverse document frequency transformation
(IDF). For
, and L i as the number
of sentences where the word w i appears, the IDF transformation is given by:
| L |
as the number of sequences in the training set
L
 
Search WWH ::




Custom Search