Next: The Phone Probability
Up: System Description
Previous: System Description
Mapping the waveform to an acoustic vector is necessary in speech recognition systems to reduce the dimensionality of the speech and so make the modelling task tractable. The choice of acoustic vector representation is guided by the form of the acoustic model which will be required to fit this data. For example, the common use of diagonal covariance Gaussian models in HMM systems requires an acoustic vector that has independent elements. However, the connectionist system presented here does not require that the inputs be orthogonal, and hence a wider choice is available. The system has two standard acoustic vector representations, both of which give approximately the same performance: MEL+, a twenty channel power normalised mel-scaled filterbank representation augmented with power, pitch and degree of voicing, and PLP, a twelfth order perceptual linear prediction cepstral coefficients plus energy.
Another feature used for describing the acoustic processing is the ordering of the feature vectors. In systems which use non-recurrent observation modelling, this property is ignored. With a recurrent network, the vector ordering -- or equivalently, the direction of time -- makes a difference in the probability estimation process. In the experiments described later in this chapter, results are reported for systems using both forward and backward (in-time) trained recurrent networks.