Next: Training the RNN
Up: The use of recurrent
Previous: Decoding Scaled Likelihoods
Training of the hybrid RNN/HMM system entails estimating the parameters of both the underlying Markov chain and the weights of the recurrent network. Unlike HMMs which use exponential-family distributions to model the acoustic signal, there is not (yet) a unified approach (e.g., EM algorithm [21]) to simultaneously estimate both sets of parameters. A variant of Viterbi training is used for estimating the system parameters and is described below.
The parameters of the system are adapted using Viterbi training to maximise the log likelihood of the most probable state sequence through the training data. First, a Viterbi pass is made to compute an alignment of states to frames. The parameters of the system are then adjusted to increase the likelihood of the frame sequence. This maximisation comes in two parts; (1) maximisation of the emission probabilities and (2) maximisation of the transition probabilities. Emission probabilities are maximised using gradient descent and transition probabilities through the re-estimation of duration models and the prior probabilities on multiple pronunciations. Thus, the training cycle takes the following steps:
We generally find that four iterations of this Viterbi training are sufficient.