Next: Context Modelling
Up: The Hybrid RNN/HMM
Previous: The Hybrid RNN/HMM
The HMM framework has been well documented in the speech recognition literature (e.g., [2]). The framework is revisited here in the interest of making this chapter relatively self-contained and to introduce some notation. The standard statistical recognition criterion is given by
where
is the recognised word string,
is a valid word
sequence, and
is the observed acoustic signal (typically a
sequence of feature vectors
). For typical HMM systems, there
exists a mapping between a state sequence
on a discrete, first-order Markov chain and the word sequence
. This allows expressing the recognition
criterion (1) as finding the maximum a posteriori (MAP)
state sequence of length T, i.e.,
Note that the HMM framework has reduced the primary modelling requirement to
stationary, local (in time) components;
namely the observation terms
and transition terms
.
There are a number of well known methods for modelling the observation
terms.
Continuous density HMMs typically use Gaussian mixture distributions of the
form
Recently, there has been work in the area of hybrid connectionist/HMM systems. In this approach, nonparametric distributions represented with neural networks have been used as models for the observation terms [3,4].