Introduction



next up previous
Next: Recorded Material Up: WSJCAM0 Corpus and Previous: WSJCAM0 Corpus and

Introduction

The speech database described in this document is the UK English equivalent of a subset of the US American English WSJ0 database [1]gif. The name of the UK English version, WSJCAM0, represents the Wall Street Journal recorded at the University of CAMbridge (phase 0). It consists of speaker-independent (SI) read material, split into training, development test and evaluation test sets. There are 90 utterances from each of 92 speakers that are designated as training material for speech recognition algorithms. A further 48 speakers each read 40 sentences utterances containing only words from a fixed 5,000 word vocabulary of 40 sentences from the 64,000 word vocabulary, which will be used as testing material. Each of the total of 140 speakers also recorded a common set of 18 adaptation sentences. Recordings were made from two microphones: a far-field desk microphone and a head-mounted close-talking microphone.

All resulting waveforms will be distributed in compressed digitised form, accompanied by orthographic transcriptions and automatically generated phone and word alignments.




Tue Jan 17 18:52:43 GMT 1995