Next: THE BASIC SYSTEM Up: A COMPARISON OF Previous: A COMPARISON OF

INTRODUCTION

The Cambridge Recurrent Error Propagation Network Speech Recognition System has been shown to be able to perform speaker independent phoneme recognition as well as the best Hidden Markov Models (HMMs) [1,2].

Where as the issue of preprocessors for HMMs has been well researched, this issue has received far less attention from the connectionist viewpoint. However, there is a fundamental difference between the nearest neighbour decisions surfaces formed in the input space by a HMM vector quantiser and the hyperplanes formed by error propagation networks. One of the advantages of the connectionist approach is that the elments of the input vector can have different variances without giving undue bias to these input dimensions. Thus, while it is necessary to code different types of HMM input in different codebooks, the connectionist input may be treated as a single large vector.

This paper begins by describing the basic recogniser and proceeds to evaluate many commonly used preprocessors. These contain combinations of two forms of input, spectral representations and simple low dimensional features. The spectral representations are based on filterbank, Fast Fourier Transform (FFT) and Linear Predictive Coding (LPC), and in the case of FFT and LPC the cepstral representations are also used. The features used are zero crossing rate, energy and estimates of pitch frequency, degree of voicing, formant positions and amplitudes. In addition to an evaluation of preprocessors, other means of improving the performance of the recurrent net are also considered.

As with our previous work, the evaluation of these preprocessors is performed on the DARPA TIMIT Acoustic Phonetic Continuous Speech Database [3] (hereafter known as the TIMIT database). This is a well respected large database which is widely available to other researchers, thus enabling comparison between this work and the work of others.