Q6.1: What is speech recognition?

Automatic Speech Recognition

Automatic speech recognition is the process by which a computer maps an acoustic speech signal to text.

Automatic speech understanding is the process by which a computer maps an acoustic speech signal to some form of abstract meaning of the speech.

What does speaker dependent / adaptive / independent mean?

A speaker dependent system is developed to operate for a single speaker. These systems are usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker adaptive or speaker independent systems.

A speaker independent system is developed to operate for any speaker of a particular type (e.g. American English). These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems. However, they are more flexible.

A speaker adaptive system is developed to adapt its operation to the characteristics of new speakers. It's difficulty lies somewhere between speaker independent and speaker dependent systems.

What does small/medium/large/very-large vocabulary mean?

The size of vocabulary of a speech recognition system affects the complexity, processing requirements and the accuracy of the system. Some applications only require a few words (e.g. numbers only), others require very large dictionaries (e.g. dictation machines). There are no established definitions, however, try

small vocabulary - tens of words
medium vocabulary - hundreds of words
large vocabulary - thousands of words
very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word mean?

An isolated-word system operates on single words at a time - requiring a pause between saying each word. This is the simplest form of recognition to perform because the end points are easier to find and the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more consistent they are easier to recognise.

A continuous speech system operates on speech in which words are connected together, i.e. not separated by pauses. Continuous speech is more difficult to handle because of a variety of effects. First, it is difficult to find the start and end points of words. Another problem is "coarticulation". The production of each phoneme is affected by the production of surrounding phonemes, and similarly the the start and end of words are affected by the preceding and following words. The recognition of continuous speech is also affected by the rate of speech (fast speech tends to be harder).

Back to Section 6 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q6.2], [Q6.3], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 17:42 18-Jun-1996