Abstract for johnson_riao00

Proceedings, RIAO 2000


S.E. Johnson , P. Jourlin, K. Sparck Jones & P.C. Woodland

April 2000

This paper describes a system for retrieving relevant portions of complete broadcast news shows starting with only the audio data. A novel system of automatically detecting and removing commercials is described and shown to increase the performance of the system whilst also reducing the computational effort required. The sophisticated large vocabulary speech recogniser which produces the high-quality transcriptions and the window-based retrieval system with post-merging are also described.

Results are presented using the 1999 TREC-8 Spoken Document Retrieval data for the task where no story boundaries are known. Experiments investigating the effectiveness of all aspects of the system are described and the relative benefits of automatically eliminating commercials, enforcing broadcast structure during retrieval, using relevance feedback, changing retrieval parameters and merging during post-processing are shown. An Average Precision of 46.5\%, when duplicates are scored as irrelevant is shown to be achievable using this system.

| (ftp:) johnson_riao00.ps.gz | (http:) johnson_riao00.ps.gz | (ftp:) johnson_riao00.pdf | (http:) johnson_riao00.pdf | (http:) johnson_riao00.html/ |

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.