Abstract for rosti_tr461

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR461

SWITCHING LINEAR DYNAMICAL SYSTEMS FOR SPEECH RECOGNITION

A-V.I. Rosti & M.J.F. Gales

December 12, 2003

This paper describes the application of Rao-Blackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model. The SLDS is a hybrid of standard hidden Markov models (HMMs) and linear dynamical systems. It is an extension of the stochastic segment model (SSM) where segments are assumed independent. SLDSs explicitly take into account the strong co-articulation present in speech using a Gauss-Markov process in a low dimensional, latent, state space. Unfortunately, inference in SLDS is intractable unless the discrete state sequence is known. RBGS is one approach that may be applied for both improved training and decoding for this form of intractable model. The theory of SLDS and RBGS is described, along with an efficient proposal distribution. The performance of the SLDS and SSM using RBGS for training and inference is evaluated on the ARPA Resource Management task.

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.