Abstract for kim_eurospeech2001

Proc. Eurospeech 2001

THE USE OF PROSODY IN A COMBINED SYSTEM FOR PUNCTUATION GENERATION AND SPEECH RECOGNITION

Ji-Hwan Kim and P.C. Woodland

September 2001

In this paper, we discuss a combined system for punctuation generation and speech recognition. This system incorporates prosodic information with acoustic and language model information. Experiments are conducted for both the reference transcriptions and speech recogniser outputs. For the reference transcription case, prosodic information is shown to be more useful than language model information. When these information sources are combined, we can obtain an F-measure of up to 0.7830 for punctuation recognition.

A few straightforward modifications of a conventional speech recogniser allow the system to produce punctuation and speech recognition hypotheses simultaneously. The multiple hypotheses are produced by the automatic speech recogniser and are re-scored by prosodic information. When prosodic information is incorporated, the F-measure can be improved by 19% relative. At the same time, small reductions in word error rate are obtained.

(ftp:) kim_eurospeech2001.ps.gz (http:) kim_eurospeech2001.ps.gz
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
(ftp:) kim_eurospeech2001.pdf | (http:) kim_eurospeech2001.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.