CLUSTER VOTING FOR SPEAKER DIARISATION
S. E. Tranter
May 2004
It is often important to be able to automatically detect 'who spoke when' in audio data. The speaker diarisation task attempts to address this problem on Broadcast News data by defining an error rate which can be used to evaluate segmentations and their associated speaker labels. Many different methods exist to automatically generate such segmentations and it would be desirable if segmentations from different origins could be combined to produce a more accurate one. This paper introduces a cluster voting scheme which attempts to use information from more than one diarisation system to produce a new speaker segmentation with a lower diarisation error rate.
The scheme first generates a set of possible segmentations which minimise a distance metric based on the diarisation error rate and then defines a method of picking the final output from this set. Experiments presented using two inputs confirm that the diarisation error rate can be reduced using this new method.
If you have difficulty viewing files that end '.gz'
,
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
web site.
If you have difficulty viewing files that are in PostScript, (ending
'.ps'
or '.ps.gz'
), then you may be able to
find tools to view them at
the gsview
web site.
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.