Abstract for gales_tr284

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR284


M. J. F. Gales, K. M. Knill and S. J. Young

January 1997

This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoustic space into a set of clusters and associating a "short-list" of Gaussians with each of these clusters. Any Gaussian not in the short-list is simply approximated. This paper examines new techniques for obtaining "good" short-lists. All the new schemes make use of state information, specifically which state each of the components belongs to. In this way a maximum number of components per state may be specified, hence reducing the size of the short-list. The first technique introduced is a simple extension of the standard GS one, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approach are compared with the standard GS scheme on a large vocabulary speech recognition task. On this task, the use of state information reduced the percentage of Gaussians computed to 10-15%, compared with 20-30% for the standard GS scheme, with little degradation in performance.

(ftp:) gales_tr284.ps.gz (http:) gales_tr284.ps.gz
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) gales_tr284.pdf | (http:) gales_tr284.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.