Next: Efficient Models Up: Special Features Previous: Connectionist Model Combination

Duration Modelling

 

The recurrent network is used to estimate the local observation probabilities within the HMM framework. Although the dynamics of the network encode some segmental information, explicit modelling of phone duration improves the hybrid system's performance on word recognition tasks*.

Phone duration within the hybrid system is modelled with a hidden Markov process. In this approach, a Markov chain is used to represent phone duration. The duration model is integrated into the hybrid system by expanding the phone model from a single state to multiple states with tied observation distributions, i.e.,

 

for i and j states of the same phone model.

Choice of Markov chain topology is dependent on the decoding approach. Decoding using a maximum likelihood word sequence criterion is well suited to complex duration models as found in [29]. Viterbi decoding, however, results in a Markov chain on duration where the parameters are not hidden (given the duration). Because of this, a simple duration model as shown in figure 6 is employed. The free parameters in this model are (1) the minimum duration of the model, N, (2) the value of the first N-1 state transitions, a, (3) the self-transition of the last state x, and (4) the exit transition value, b. The duration score generated by this model is given as

 

and is not necessarily a proper distribution.

  
Figure 6: Phone-deletion penalty duration model.

The parameters are determined in the following manor. First, the minimum duration N is set equal to half the average duration of the phone. The average duration of the phone is computed from Viterbi alignment of the training data. The parameters a and x are arbitrarily set to . The parameter b represents a phone-deletion penalty and is empirically set to maximise performance on a cross-validation set.

Next: Efficient Models Up: Special Features Previous: Connectionist Model Combination


Tony Robinson Sun Jun 4 20:04:56 BST 1995