Q6.3: How can I build a simple speech recogniser?

QUICKY RECOGNIZER sketch:

Doug Danforth provides a detailed account in article 253 in the comp.speech archives. A summary is provided below. It is also available by anonymous ftp

: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognition

This is a simple recognizer that should give you 85%+ recognition accuracy. The accuracy is a function of the words you have in your vocabulary. Long distinct words are easy. Short similar words are hard. You can get 98+% on the digits with this recognizer.

Overview:

Find the begining and end of the utterance.
Filter the raw signal into frequency bands.
Cut the utterance into a fixed number of segments.
Average data for each band in each segment.
Store this pattern with its name.
Collect training set of about 3 repetitions of each pattern (word).
Recognize unknown by comparing its pattern against all patterns in the training set and returning the name of the pattern closest to the unknown.

Many variations upon the theme can be made to improve the performance. Try different filtering of the raw signal and different processing methods.

Public Domain Recognition Software

Q6.5 contains information on public domain speech recognition software including: Lotec and Myers' Hidden Markov Model software.

Discrete Hidden Markov Model Demonstration Software

Hidden Markov Models (HMMs) are widely used in speech recognition systems. Joe Picone has put together some demonstration software for basic discrete HMMs including Viterbi and Baum-Welch training and evaluation, random sequence generation (generating data from a model), and model updating (useful for incremental training). There is a simple demo program that supports all of these modes from command line arguments. This allows experiments to test the classic coin-toss examples commonly described in textbooks. The code closely parallels the following textbook:

J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing of Speech Signals, MacMillan, 1993, ISBN: 0-02-328301-7.

The code is written in C++ and is intended to facilitate learning and understanding of the algorithms. The code is available on the ISIP web site:
http://www.isip.msstate.edu/software/

Lecture notes corresponding to the examples are also available:
http://www.isip.msstate.edu/publications/1996/speech_recognition_short_course

Back to Section 6 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q6.1], [Q6.2], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 13:13 07-Aug-1996