Data Formats



next up previous
Next: File Naming Formats Up: WSJCAM0 Corpus and Previous: Pronunciation Dictionary

Data Formats

The primary file format for all waveforms is NIST's SPHERE format, the DOT file format is used for transcriptions and the original prompt files are in PTX format. In addition, an information file about each speaker (.ifo) is part of the distribution. The format specifications as described below are taken from the document wsj-format-spec.doc at the NIST ftp site: gov.nist.ncsl.jaguar.






Tue Jan 17 18:52:43 GMT 1995