Next: The Acoustic Vector
Up: The use of recurrent
Previous: Recurrent Networks for
The basic hybrid RNN/HMM system is shown in figure 3.
Figure 3: Overview of the hybrid RNN/HMM system.
Common to most recognition systems, speech is represented at the waveform, acoustic feature, phone probability and word string levels. A preprocessor extracts acoustic vectors from the waveform which are then passed to a recurrent network which estimates which phones are likely to be present. This sequence of phone observations is then parsed by a conventional hidden Markov model to give the most probable word string that was spoken. The rest of this section will discuss these components in more detail.