Digital Signal Processing Reference
In-Depth Information
4
Hierarchical Approach and Downsampling
Schemes
A simple and successful phoneme recognizer in a hierarchical ANN framework
is proposed in [Pinto 08b]. In Section 3.4 we could observed that this method
compares favorably to hitherto approaches. In this scheme, phoneme poste-
riors are estimated by a two-level hierarchical structure. In the first level, a
MLP estimates intermediate phoneme posteriors based on a temporal win-
dow of cepstral features. In the second level, another MLP estimates final
phoneme posteriors based on a temporal window of intermediate posterior
features. The final phoneme posteriors are then input to a Viterbi decoder.
In [Pinto 08b, Pinto 09] it is shown that the hierarchical scheme outper-
forms considerably the no-hierarchical approach in phoneme and word recog-
nition tasks. In addition, in comparison with the first MLP, the second MLP
is able to process a larger temporal context, improving performance. How-
ever, the use of a second MLP in tandem highly increases computational time
and memory requirements. Therefore, the main goal in this chapter refers to
optimize this scheme under these considerations while keeping system accu-
racy. In the next chapter, an extension of the hierarchical scheme is presented
which goal corresponds to improve recognition accuracy.
In order to reduce computational time and/or number of parameters of the
hierarchical scheme, several downsampling schemes are investigated. These
schemes allow removing redundant information contained in the intermediate
posteriors. In this way, the system performance is not affected. In addition,
the highly decrease of computational time and number of parameters make the
system portable in a real-time application or embedded system [Vasquez 09d].
4.1 Hierarchical Approach
This work is based on the hierarchical structure described in [Pinto 08b]. As
mentioned above, it consists of two levels estimating posteriors as shown
in Fig. 4.1. In the first level ( feature level ), an MLP ( MLP 1) estimates
intermediate posterior probabilities x k,t of each of the n phonetic classes
 
Search WWH ::




Custom Search