Abstract for kim_thesis

PhD Thesis, University of Cambridge


Ji-Hwan Kim

August 2001

The work in this thesis concerns Named Entity (NE) recognition from speech and its use in the generation of enhanced speech recognition output with automatic punctuation and automatic capitalisation. A method for the automatic generation of rules is proposed for NE recognition. Punctuation marks are generated using context and prosody information. Capitalisation is produced based on the results of NE recognition and punctuation generation.

Previous work regarding the NE task is mainly categorised by hand crafted rule-based systems and stochastic systems. By contrast, in this thesis, an automatic rule generating method, which uses the Brill rule inference approach, is proposed. The performance of the rule-based NE recogniser is compared with that of the BBN's commercial implementation called IdentiFinder. When only the sequences of words are available, both systems show almost equal performance as is also the case with additional information such as punctuation, capitalisation and name lists. In cases where input texts are corrupted by speech recognition errors, the performances of both systems are degraded by almost the same level. Although the rule-based approach is different from the widely used stochastic method, these results show that automatic rule inference is a viable alternative to the stochastic approach to NE recognition, while retaining the advantages of a rule-based approach.

A punctuation generation system which incorporates prosodic information along with acoustic and language model information is presented. Experiments are conducted for both the reference transcriptions and speech recogniser outputs. For reference transcription, prosodic information is shown to be more useful than language model information. A few straightforward modifications of a conventional speech recogniser allow the system to produce punctuation and speech recognition hypotheses simultaneously. The multiple hypotheses are produced by the automatic speech recogniser and are re-scored by prosodic information. When prosodic information is incorporated, the F-measure can be improved and small reductions in word error rate are obtained at the same time. An alternative approach for generating punctuation marks from the 1-best speech recogniser output which does not have any punctuation marks is also proposed. Its results are compared with those from the combined punctuation generation and speech recognition system.

Two different systems are proposed for the task of capitalisation generation. The first system is a slightly modified speech recogniser. In this system, every word in its vocabulary is duplicated: it is given once in a decapitalised form and again in a capitalised form. In addition, the language model is re-trained on mixed case texts. The other system is based on NE recognition and punctuation generation, since most capitalised words are first words in sentences or NE words. Both systems are compared first on the condition that every procedure is fully automated. The system based on NE recognition and punctuation generation shows better results in word error rate, in F-measure and in SER than the system modified from the speech recogniser. This is because the latter system has distortion of the LM, a sparser LM, and loss of half scores. The performance of the system based on NE recognition and punctuation generation is investigated by including one or more of the following: reference word sequences, reference NE classes and reference punctuation marks. The results show that this system is robust to NE recognition errors. Although most punctuation generation errors cause errors in this capitalisation generation system, the number of errors caused in capitalisation generation does not exceed the number of errors in punctuation generation. In addition, the results demonstrate that the effect of NE recognition errors is independent of the effect of punctuation generation errors for capitalisation generation.

(ftp:) kim_thesis.ps.gz (http:) kim_thesis.ps.gz
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) kim_thesis.pdf | (http:) kim_thesis.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.