VECTOR QUANTIZATION BIGRAM HIDDEN MARKOV MODELLING FOR IMPROVED PHONEME RECOGNITION
G. Wong and S. J. Young
July 1992
The development of accurate and robust phonetic models is essential for high-performance continuous speech recognition since the words themselves are mapped out as a sequence of phonemes. One approach is to model the time dependencies of the acoustic features in a phoneme more accurately. Short-time correlation between successive feature vectors (condensed as vector quantization codes) is modelled as discrete emission probabilities embedded in the observation process of a Hidden Markov Model (HMM). Reestimation equations in an Expectation-Maximization (EM) framework are presented for the training of such a model, as well as the Viterbi decoding algorithm necessary for phoneme based continuous speech recognition. The Expectation step in the parameter reestimation stage calculates the log likelihood of the observation sequence and the Maximization step yields the estimates of the state transition terms and conditional output pdf parameters separately. A Lagrange interpretation of the derived reestimation formulas is also presented. Recognition results using the TIMIT database are compared with conventional discrete Hidden Markov modelling methods and a measurable improvement (14\% error rate reduction) has been achieved. Implementation and several aspects of this modelling method are discussed with possible extensions for further improvements.
If you have difficulty viewing files that end '.gz'
,
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
web site.
If you have difficulty viewing files that are in PostScript, (ending
'.ps'
or '.ps.gz'
), then you may be able to
find tools to view them at
the gsview
web site.
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.