Abstract for tranter_tr476

Cambridge University Engineering Department Technical Report, CUED/F-INFENG/TR-476. May 2004.


S. E. Tranter

May 2004

It is often important to be able to automatically detect 'who spoke when' in audio data. The speaker diarisation task attempts to address this problem on Broadcast News data by defining an error rate which can be used to evaluate segmentations and their associated speaker labels. Many different methods exist to automatically generate such segmentations and it would be desirable if segmentations from different origins could be combined to produce a more accurate one. This paper introduces a cluster voting scheme which attempts to use information from more than one diarisation system to produce a new speaker segmentation with a lower diarisation error rate.

The scheme first generates a set of possible segmentations which minimise a distance metric based on the diarisation error rate and then defines a method of picking the final output from this set. Experiments presented using two inputs confirm that the diarisation error rate can be reduced using this new method.

| (ftp:) tranter_tr476.pdf | (http:) tranter_tr476.pdf | (ftp:) tranter_tr476.ps.gz | (http:) tranter_tr476.ps.gz |

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.