Next: The Hybrid RNN/HMM Up: The use of recurrent Previous: The use of recurrent

Introduction

Most -- if not all -- automatic speech recognition systems explicitly or implicitly compute a score (equivalently, distance, probability, etc.) indicating how well a novel utterance matches a model of the hypothesised utterance. A fundamental problem in speech recognition is how this score may be computed, given that speech is a non-stationary stochastic process. In the interest of reducing the computational complexity, the standard approach used in the most prevalent systems (e.g., dynamic time warping (DTW) [1] and hidden Markov models (HMMs) [2]) factors the hypothesis score into a local acoustic score and a local transition score. In the HMM framework, the observation term models the local (in time) acoustic signal as a stationary process, while the transition probabilities are used to account for the time-varying nature of speech.

This chapter presents an extension to the standard HMM framework which addresses the issue of the observation probability computation. Specifically, an artificial recurrent neural network (RNN) is used to compute the observation probabilities within the HMM framework. This provides two enhancements to standard HMMs; (1) the observation model is no longer local, and (2) the RNN architecture provides a nonparametric model of the acoustic signal. The result is a speech recognition system able to model long-term acoustic context without strong assumptions on the distribution of the observations. One such system has been successfully applied to a 20,000 word, speaker-independent, continuous speech recognition task and is described in this chapter.



Tony Robinson Sun Jun 4 20:04:56 BST 1995