Abstract for clarkson_icassp97

In Proc IEEE International Conference on Speech and Signal Processing, Munich, Germany, 1997.

LANGUAGE MODEL ADAPTATION USING MIXTURES AND AN EXPONENTIALLY DECAYING CACHE

P.R. Clarkson and A.J. Robinson

April 1997

This paper presents two techniques for language model adaptation.

The first is based on the use of mixtures of language models: the training text is partitioned according to topic, a language model is constructed for each component, and at recognition time appropriate weightings are assigned to each component to model the observed style of language.

The second technique is based on augmenting the standard trigram model with a cache component in which words recurrence probabilities decay exponentially over time.

Both techniques yield a significant reduction in perplexity over the baseline trigram language model when faced with multi-domain test text, the mixture-based model giving a 24% reduction and the cache-based model giving a 14% reduction. The two techniques attack the problem of adaptation at different scales, and as a result can be used in parallel to give a total perplexity reduction of 30%.


(ftp:) clarkson_icassp97.ps.gz (http:) clarkson_icassp97.ps.gz
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) clarkson_icassp97.pdf | (http:) clarkson_icassp97.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.