Next: Practical Issues
Up: Summary of Variations
Previous: Training Criterion
One of the main benefits of the recurrent network is that it relaxes the conditional independence assumption for the local observation probabilities. This results in a model which can represent the acoustic context without explicitly modeling phonetic context. This has positive ramifications in terms of the number of required parameters and the complexity of the search procedure.
The second main assumption of standard HMMs is that the observation distributions are from the exponential family (e.g., multinomial, Gaussian, etc.) or mixtures of exponential family distributions. The recurrent network, however, makes much fewer assumptions about the form of the acoustic vector distribution. In fact, it is quite straightforward to use real-valued and/or categorical data for the acoustic input. In theory, a Gaussian mixture distribution and a recurrent network can both be considered nonparametric estimators by allowing the size (e.g., number of mixtures or state units, respectively) to increase with additional training data. However, because standard HMMs employ maximum likelihood estimation there is the practical problem of sufficient data to estimate all the parameters. Because the recurrent network shares the state units for all phones, this data requirement is less severe.