Next: System Description Up: The Hybrid RNN/HMM Previous: Context Modelling

Recurrent Networks for Phone Probability Estimation

 

The incorporation of feedback into a MLP gives a method of efficiently incorporating context in much the same way as an infinite impulse response filter can be more efficient than a finite impulse response filter in terms of storage and computational requirements. Duplication of resources is avoided by processing one frame of speech at a time in the context of an internal state as opposed to applying nearly the same operation to each frame in a larger window. Feedback also gives a longer context window, so it is possible that uncertain evidence can be accumulated over many time frames in order to build up an accurate representation of the long term contextual variables.

There are a number of possible methods for incorporating feedback into a speech recognition system. One approach is to consider the forward equations of a standard HMM as recurrent network-like computation. The HMM can then be trained using the maximum likelihood criterion [14] or other discriminative training criteria [15,16,17]. Another approach is to use a recurrent network only for estimation of the emission probabilities in an HMM framework. This is similar to the hybrid connectionist-HMM approach described in [3] and is the approach used in the system described in this chapter.

The form of the recurrent network used here was first described in [18]. The paper took the basic equations for a linear dynamical system and replaced the linear matrix operators with non-linear feedforward networks. After merging computations, the resulting structure is illustrated in figure 1. The current input, , is presented to the network along with the current state, . These two vectors are passed through a standard feed-forward network to give the output vector, and the next state vector, .

  
Figure 1: The recurrent network used for phone probability estimation.

Defining the combined input vector as and the weight matrices to the outputs and the next state as and , respectively:

   

The inclusion of ``1'' in provides the mechanism to apply a bias to the non-linearities. As is easily seen in (4)--(6), the complete system is no more than a large matrix multiplication followed by a non-linear function.

A very important point to note about this structure is that if the parameters are estimated using certain training criteria (see section 4), the network outputs are consistent estimators of class posterior probabilities. Specifically, the outputs are interpreted as

 

The softmax non-linear function of (5) is an appropriate non-linearity for estimating posterior probabilities as it ensures that the values are non-negative and sum to one. Work on generalised linear models [19] also provides theoretical justification for interpreting as probabilities. Similarly, the sigmoidal non-linearity of (6) is the softmax non-linearity for the two class case and is appropriate if all state unit are taken as probability estimators of hidden independent events.

 

In the hybrid approach, is used as the observation probability within the HMM framework. It is easily seen from (7) that the observation probability is extended over a much greater context then is indicated by local models as shown in (3). The recurrent network uses the internal state vector to build a representation of past acoustic context. In this fashion, the statesof the recurrent network also model dynamic information. Various techniques used in non-linear dynamics may be used to describe and analyse the dynamical behaviour of the recurrent net. For example, different realisations of the network show a variety of behaviours (e.g., limit cycles, stable equilibriums, chaos) for zero input operation of the network (i.e., ). For example, limit cycle dynamics for a recurrent network are shown in figure 2. The figure shows the projection onto two states of the network state vector over seven periods.

  
Figure 2: Projection of recurrent network state space trajectory onto two states.

Next: System Description Up: The Hybrid RNN/HMM Previous: Context Modelling


Tony Robinson Sun Jun 4 20:04:56 BST 1995