File Naming Formats



next up previous
Next: The NIST file Up: Data Formats Previous: Data Formats

File Naming Formats

Data types are differentiated by unique filename extensions. All files associated with the same utterance have the same basename. All filenames are unique across all WSJCAM and ARPA-collected WSJ corpora. Utterance IDs (basenames) will not be re-used. The filename format is as follows:

	<UTTERANCE-ID>.<XXX>

where,

	UTTERANCE-ID ::= <SSS><T><EE><UU>

where,

We were allocated the use of speaker IDs c00-czz. Speaker IDs c00-c2z were used for training speakers, speaker IDs c30-c4z were used for test speakers (both development and evaluation).

The file extensions are interpreted as follows:

	XXX ::= (data type)
		.wv1 (channel 1 - Sennheiser waveform)
		.wv2 (channel 2 - Canford waveform)
		.ptx (prompting text)
		.dot (detailed orthographic transcription)
		.ifo (information file about speaker)
		.phn (TIMIT style phone alignments)
		.wrd (TIMIT style word alignments)




Tue Jan 17 18:52:43 GMT 1995