Digital Signal Processing Reference
In-Depth Information
- HTM is an extended version of [Deng 06] with joint cepstra and delta-
cepstra as acoustic features.
[Fernandez 08] Fernandez (2008): PA = 75.4*
- Bi-directional Long Short-Term Memory (BLSTM) RNN.
- In contrast to typical RNN which uses information from the begin-
ning of the utterance to the current frame, Bi-directional RNN uses
information of the entire utterance.
- LSTM RNN is able to scan large time delays [Gers 02].
[Fosler-Lussier 08] Fosler-Lussier and Morris (2008): PA = 71.8*
- TANDEM and Conditional Random Fields (CRFs).
- Evaluation of CRF, TANDEM and combination of both (CRANDEM)
(See Section 3.5.2).
- Best reported result using CRANDEM, trained with phoneme poste-
rior, articulatory posteriors and PLP features.
[Pinto 08b] Pinto et al. (2008): PA = 73.4**
- Hierarchical MLP structure in hybrid HMM/ANN framework.
- Two level hierarchy where each level corresponds to a MLP.
- The MLP at the second level uses large temporal context based on pos-
terior features. This method is described in more detail in Chapter 4.
3.5 State-of-the-Art Hierarchical Schemes
We can observe in Section 3.4 that high phoneme accuracies have been al-
ready achieved based on hierarchical structures. In fact, as we will see in the
following chapters, our work is based on hierarchical schemes. The main ad-
vantage of these approaches refers to an optimal technique to combine several
classifiers that are implemented based on different criteria. In addition, the
combination is more fruitful if the criteria involved are complementary.
A hierarchical scheme is usually implemented by combining sequentially
severalclassifiers.However,morecomplex structures can be found where a
hierarchical level consists of different classifiers working in parallel, or where
the input of a high-level classifier corresponds to the output of a non-adjacent
sequential classifier. Additionally, some low hierarchical levels can be fed
with the output of higher level classifiers, resulting in recursive hierarchical
structures.
In this section, a survey of hierarchical schemes used in ASR systems is
presented. In particular, techniques involving neural networks are empha-
sized. A simple and successful approach based on a sequential concatenation
of two MLPs ([Pinto 08b]) is described in detail in the next chapter, since it
forms the basis of this topic.
3.5.1 Tandem Approach
In [Hermansky 00], the authors proposed a method for combining a discrim-
inative and a generative model in tandem. This method can be classified as
 
Search WWH ::




Custom Search