ADAPTIVE TRAINING USING STRUCTURED TRANSFORMS
K. Yu and M. J. F. Gales
Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard adaptive training employs a single transform to represent unwanted acoustic variability for an utterance. A canonical model representing only the inherent speech variability may then be trained given this set of transforms. For found data there are commonly multiple acoustic factors affecting the speech signal. This paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transform are considered, cluster mean interpolation and constrained MLLR. Re-estimation formulae for estimating the canonical model using both maximum likelihood and minimum phone error training are presented. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to significantly reduce the word error rate.
If you have difficulty viewing files that end
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
If you have difficulty viewing files that are in PostScript, (ending
'.ps.gz'), then you may be able to
find tools to view them at
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.