The speech database described in this document is the UK English
equivalent of a subset of the US American English WSJ0
database [1]. The name of the UK English version,
WSJCAM0, represents the Wall Street Journal recorded at the University
of CAMbridge (phase 0). It consists of speaker-independent (SI) read
material, split into training, development test and evaluation test
sets. There are 90 utterances from each of 92 speakers that are
designated as training material for speech recognition algorithms. A
further 48 speakers each read 40 sentences utterances containing only
words from a fixed 5,000 word vocabulary of 40 sentences from the 64,000
word vocabulary, which will be used as testing material. Each of the
total of 140 speakers also recorded a common set of 18 adaptation
sentences. Recordings were made from two microphones: a far-field desk
microphone and a head-mounted close-talking microphone.
All resulting waveforms will be distributed in compressed digitised form, accompanied by orthographic transcriptions and automatically generated phone and word alignments.