Next: Context Modelling Up: The Hybrid RNN/HMM Previous: The Hybrid RNN/HMM

The HMM Framework

 

The HMM framework has been well documented in the speech recognition literature (e.g., [2]). The framework is revisited here in the interest of making this chapter relatively self-contained and to introduce some notation. The standard statistical recognition criterion is given by

 

where is the recognised word string, is a valid word sequence, and is the observed acoustic signal (typically a sequence of feature vectors ). For typical HMM systems, there exists a mapping between a state sequence on a discrete, first-order Markov chain and the word sequence . This allows expressing the recognition criterion (1) as finding the maximum a posteriori (MAP) state sequence of length T, i.e.,

 

Note that the HMM framework has reduced the primary modelling requirement to stationary, local (in time) components; namely the observation terms and transition terms . There are a number of well known methods for modelling the observation terms. Continuous density HMMs typically use Gaussian mixture distributions of the form

 

Recently, there has been work in the area of hybrid connectionist/HMM systems. In this approach, nonparametric distributions represented with neural networks have been used as models for the observation terms [3,4].



Tony Robinson Sun Jun 4 20:04:56 BST 1995