Digital Signal Processing Reference
In-Depth Information
- Classifier combination based on hierarchical and/or committee-based
approaches.
- Use of probabilistic segmental decoder with anti-phone modeling and
bigram.
[Ming 98] Ming and Smith (1998): PA = 75.6**, 74.4*
- Bayesian triphone models.
- Triphone models built from models of less context dependency based
on the Bayesian principle.
- Main motivation for dealing with sparse training data.
[Antoniou 01] Antoniou (2001): PA = 75.8*
- Modular neural networks.
- Two-level hierarchical MLP approach.
- The first level consists of two blocks in parallel: phone detector and
broad class posteriors.
- In the phone detector several MLPs are trained, each of them special-
ized in a particular phoneme.
- The broad class posteriors represent a kind of articulatory attributes
calculated from the phone detectors.
- In the second level, an MLP for each phoneme is trained based on the
information delivered from the first level.
[Chen 01] Chen (2001): PA = 73.5**
- TRAPs-like classifiers in hybrid HMM/ANN framework.
- Evaluation of: MLP based on PLP, TRAPS, HATS, TMLP. See Sec-
tion 3.5.4.
- Best result obtained by combining MLP based on PLP and HATS.
[Schwarz 06] Schwarz et al. (2006): PA = 78.5**
- Split Temporal Context (STC).
- Simplified and extended version of TRAPs.
- Long temporal context is processed by different TRAPs-kind classi-
fiers, specialized in different context.
- Best reported result obtained by splitting the context in five blocks,
use of bigram and the cross-validation data set is appended to the
training data set.
[Sha 06] Sha and Saul (2006): PA = 69.9*
- Large margin classification by GMMs.
- Margin maximization criterion as Support Vector Machines (SVMs).
- Margin selected as the Mahalanobis distance from labeled samples to
decision boundaries defined by competing classes.
- Advantage over SVMs refers to directly model nonlinear decision
boundaries, avoiding postprocessing such as the kernel trick.
- MFCC + Delta + double Delta.
[Deng 07] Deng and Yu (2007): PA = 75.2*
- Hidden Trajectory Model (HTM).
- Use of the dynamic structure in the hidden Vocal Tract Resonances
(VTR) for modeling contextual influences among phonemes.
 
Search WWH ::




Custom Search