Phoneme Recognition Task - Hierarchical Neural Network Structures for Phoneme Recognition

Digital Signal Processing Reference

In-Depth Information

- Classifier combination based on hierarchical and/or committee-based

approaches.

- Use of probabilistic segmental decoder with anti-phone modeling and

bigram.

•

[Ming 98] Ming and Smith (1998): PA = 75.6**, 74.4*

- Bayesian triphone models.

- Triphone models built from models of less context dependency based

on the Bayesian principle.

- Main motivation for dealing with sparse training data.

•

[Antoniou 01] Antoniou (2001): PA = 75.8*

- Modular neural networks.

- Two-level hierarchical MLP approach.

- The first level consists of two blocks in parallel: phone detector and

broad class posteriors.

- In the phone detector several MLPs are trained, each of them special-

ized in a particular phoneme.

- The broad class posteriors represent a kind of articulatory attributes

calculated from the phone detectors.

- In the second level, an MLP for each phoneme is trained based on the

information delivered from the first level.

•

[Chen 01] Chen (2001): PA = 73.5**

- TRAPs-like classifiers in hybrid HMM/ANN framework.

- Evaluation of: MLP based on PLP, TRAPS, HATS, TMLP. See Sec-

tion 3.5.4.

- Best result obtained by combining MLP based on PLP and HATS.

•

[Schwarz 06] Schwarz et al. (2006): PA = 78.5**

- Split Temporal Context (STC).

- Simplified and extended version of TRAPs.

- Long temporal context is processed by different TRAPs-kind classi-

fiers, specialized in different context.

- Best reported result obtained by splitting the context in five blocks,

use of bigram and the cross-validation data set is appended to the

training data set.

•

[Sha 06] Sha and Saul (2006): PA = 69.9*

- Large margin classification by GMMs.

- Margin maximization criterion as Support Vector Machines (SVMs).

- Margin selected as the Mahalanobis distance from labeled samples to

decision boundaries defined by competing classes.

- Advantage over SVMs refers to directly model nonlinear decision

boundaries, avoiding postprocessing such as the kernel trick.

- MFCC + Delta + double Delta.

•

[Deng 07] Deng and Yu (2007): PA = 75.2*

- Hidden Trajectory Model (HTM).

- Use of the dynamic structure in the hidden Vocal Tract Resonances

(VTR) for modeling contextual influences among phonemes.

Hierarchical Neural Network Structures for Phoneme Recognition

Search WWH ::

Custom Search

Home