Project Description
One of the fundamental problems with deploying automatic speech
recognition (ASR) systems is that they must be able to operate in a
wide range of acoustic environments. As the acoustic environment may
vary dramatically, for example moving from a quiet office environment
to a moving car with high-levels of background noise, it is essential
that ASR systems can detect, and adapt to, these changing
conditions. The overall aim of the project is to develop approaches
that allow ASR systems to respond to changing acoustic conditions,
while maintaining high levels of recognition accuracy. The schemes
developed should be flexible, in that they should be applicable to a
wide range of tasks, for example both small and large
vocabulary systems. At the same time the computational load associated
with the techniques should be tunable depending on the nature of the
environment and the available resources. This project will build on
the current research work on Joint Uncertainty Decoding which has been
applied to a range of tasks from digit strings (AURORA2) to large
vocabulary continuous speech recognition (Broadcast News Transcription).
The research to be carried out may be split into three distinct areas:
- Rapid environment adaptation. Two related issues need to be
addressed for rapid environment adaptation. First, the estimation of
the noise environment must be rapid, both in terms of the time-delay
incurred and the environment estimation process itself. Second, having
estimated the environment parameters the noise compensation process
itself must have minimal computational overhead.
-
Environment change tracking/detection. This problem may be addressed
in two distinct approaches, First, detecting when the environment has
changed sufficiently to adversely affect the recognition, and thus
warrant updating the model parameters. Second using a scheme which
continually monitors and adapts the parameters.
-
Improved robustness. Though techniques such as uncertainty decoding
yield significant gains in performance, there can be unacceptably
large degradations in performance as the the signal-to-noise ratio
(SNR) decreases. Using HMMs alone is unlikely to address this
problem. An alternative is to use discriminative models, such as
support vector machines, in combination with HMMs. One initial
direction is the code-breaking framework using HMMs in conjunction
with SVMs based on JUD-compensated generative kernels.
The project will extend the current implementation of JUD using