The LDC was established to broaden the collection and distribution
of speech and natural language data bases for the purposes of research
and technology development in automatic speech recognition, natural
language processing and other areas where large amounts of linguistic
data are needed. Detailed information on the LDC is now available on the
WWW: http://www.ldc.upenn.edu/.
The LDC WWW server
provides information on membership agreements, license agreements, and
summaries of speech and text corpora available.
Speech Corpora
- TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX
Telephone Version of TIMIT Corpus (NTIMIT)
- Resource Management Corpora
- Air Travel Information System (ATIS) Corpora (multiple)
- ARPA Continuous Speech Recognition Corpora (WSJ etc)
- Switchboard Corpus of Recorded Telephone Conversations and
Switchboard Corpus Excerpts (Credit Card Conversations)
- Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus
(TI46)
- Texas Instruments Speaker-Independent Connected-Digit Corpus
(TIDIGITS)
- Road Rally Conversational Speech Corpus
- HCRC Map Task Corpus
- Air Traffic Control Corpus (ATC0)
- SPIDRE Speaker Identification Corpus
- YOHO Speaker Verification Corpus
- OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone
Corpus
- BRAMSHILL
- MACROPHONE
- King Corpus for Speaker Verification Research
- WSJCAM0: Cambridge Read News Corpus
- TRAINS Spoken dialog corpus
- NYNEX PhoneBook Database
- Frontiers in Speech Processing
Text Corpora
- Association for Computational Linguistics Data Collection
Initiative (ACL/DCI)
- The Penn Treebank Project - Release 2
- TIPSTER Information Retrieval Text Research Collection
- United Nations Parallel Text Corpus (English, French, Spanish)
- Japanese Language Financial New
- European Corpus Initiative-1
Lexical Databases
- CELEX Lexical Database
- COMLEX : COMmon LEXical Database of English (English syntax and
pronunciation)
Contact information:
Linguistic Data Consortium
3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA.
Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175
e-mail: ldc@ldc.upenn.edu
WWW: http://www.ldc.upenn.edu/
Back to
Q1.7 of
Section 1 of the
comp.speech FAQ Home Page.
Administrivia,
Copyright,
Submit Information :
Last Revision: 09:40 20-Feb-1997