Q1.7: Speech databases

A wide range of speech databases have been collected. These databases are primarily for the development of speech synthesis/recognition and for linguistic research.

Some databases are free but most are not. The databases normally require lots of storage space (100's of MBytes is not unusual). Do not expect to be able to ftp large amounts of speech data.

In addition to the descriptions of speech databases and speech database providers below, information can be obtained from

LDC: Linguistic Data Consortium
Provides a very wide range of speech and text data to research and commercial users: see below.
COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/
The International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques for Speech Input/Output.
Shikano's WWW site on Speech and Acoustics
http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-www-site.html
RELATOR Project
European resource initiative: see below.

The following speech data resources are described in the FAQ.

* Bavarian Archive for Speech Signals
* BUPT Spoken Digit Database (Chinese)
* Center for Spoken Language Understanding (CSLU)
* Examples of IPA Symbols
* Linguistic Data Consortium (LDC)
* NOISEX
* Oxford Acoustic Phonetic Database
* Phonemic Samples
* RELATOR project
* ShATR
* University of Victoria Phonetic Database

Back to Section 1 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q1.1], [Q1.2], [Q1.3], [Q1.4], [Q1.5], [Q1.6].
Jump to [Q1.8], [Q1.9], [Q1.10], [Q1.11]

Administrivia, Copyright, Submit Information : Last Revision: 16:48 14-May-1997