Next: TUNING THE RECURRENT Up: A COMPARISON OF Previous: LPC BASED TECHNIQUES

ADDING ADDITIONAL FEATURES

The previous sections have established that the bark scaled FFT, p20, was as good as any of the preprocessors tried. This section adds additional features to this preprocessor in the hope that new information will be added which will increase the recognition rate. The frame rate of 16ms is towards the high end of those used in speech recognition so preprocessors pp2, pp4 and pp8 divide the frame into 2, 4, and 8 sections respectively and calculate the power in each section so that an energy contour through the frame is available. These results are given in table 8 which show no trend of increasing accuracy with more energy channels, even though examination of the weight matrix reveals that some units detect changes in amplitude within a frame.

  
Table 8: Differing numbers of energies per frame

For Hidden Markov Model recognition, the bark (or equivalently mel) scaled cepstrum, bsc, is more often used than the frequency domain representation. Table 9 shows a slightly worse performance in the cepstral domain, but this figure is very close to the pln entry of table 5 to which it is linearly related.

p20 was augmented with several types of feature: pzc adds zero crossing information; pf0 adds the position of the highest peak in the cepstrum corresponding to a pitch frequency; pf3 adds the positions of the first three formants measured as peaks in the bark scaled spectrum; ppa adds the amplitudes as well as the positions of these peaks. Finally pre is the bark scaled spectrum with four energies per frame (as pp4), and all the above features with the exception of the amplitudes of the formants, which corresponds to the preprocessor of previous work. All these results are given in table 9, but unfortunately no preprocessor offers significantly better results than p20.

  
Table 9: Additional features

Next: TUNING THE RECURRENT Up: A COMPARISON OF Previous: LPC BASED TECHNIQUES