Abstract for kai_icassp04st

Proc. ICASSP 2004


K. Yu and M. J. F. Gales

May 2004

Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard adaptive training employs a single transform to represent unwanted acoustic variability for an utterance. A canonical model representing only the inherent speech variability may then be trained given this set of transforms. For found data there are commonly multiple acoustic factors affecting the speech signal. This paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transform are considered, cluster mean interpolation and constrained MLLR. Re-estimation formulae for estimating the canonical model using both maximum likelihood and minimum phone error training are presented. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to significantly reduce the word error rate.

| (ftp:) kai_icassp04st.pdf | (http:) kai_icassp04st.pdf | (ftp:) kai_icassp04st.ps.gz | (http:) kai_icassp04st.ps.gz |

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.