[Univ of Cambridge]alt[Dept of Engineering]


MIL Speech Seminars 2007-2008


The MIL Speech Seminar series schedule for Easter Term 2008 is as follows:

6th May 2008 Dan Jurafsky (Stanford University) Inducing Meaning from Text Online models of word meaning (like dictionaries and thesauri) or world knowledge (like scripts or narratives) are crucial for natural language understanding. Could we learn these meanings automatically from text? I first report on joint work with Rion Snow and Andrew Ng on inducing the meaning of words from text on the Web in the context of augmenting WordNet, a large online thesaurus of English. These include a semi-supervised method for learning when a new word is a `hypernym' or in the 'is-a' relation with another word, a new probabilistic algorithm for combining evidence from multiple relation detectors, and a algorithm for clustering the induced word senses. I then report on joint work with Nate Chambers on inducing `narratives', a script-like sequence of events that follow a protagonist. This work includes inducing the relations between events, ordering the relations and clustering them into prototype narratives.
12th May 2008 Jason Williams (At&T) Recent work on POMDP-based dialog systems at AT&T Building spoken dialog systems is difficult because speech recognition errors are common and user's behavior is unpredictable, which introduces uncertainty in the current state of the conversation. At AT&T, we have been applying partially observable Markov decision processes (POMDPs) to building these systems. We model the uncertainty in the dialog state explicitly as a Bayesian network and apply machine learning techniques to determine what the system should say or do. In this talk, I'll review the overall approach of applying statistical techniques and then describe two recent advances: first, because the system must operate in real-time, efficient Bayesian inference is crucial, yet the set of possible dialog states is enormous. To solve this, I'll present a technique which uses a particle filter to perform approximate inference in real-time. Second, to choose actions, ideally we would like to combine the robustness of machine optimization with the expertise of human designers. To tackle this, I'll present a method which unifies human expertise with automatic optimization. To illustrate these techniques, I'll provide examples of two dialog systems: a voice dialer, and a troubleshooting system that helps users restore connectivity on a failed DSL connection. Graphical displays illustrate the operation of the techniques, and quantitative results show that applying statistical techniques outperforms the traditional method of building systems by hand.
19th May 2008 Tomoki Toda (Nara Institute of Science and Technology) Vocal Tract Transfer Function Estimation Using Factor Analyzed Trajectory Hidden Markov Model The estimation of the vocal tract transfer function (VTTF) for a speech signal is an essential problem in speech processing. Because the speech signal results from a convolution of the VTTF and a quasi-periodic excitation signal, there are many missing frequency components between adjacent harmonics of the fundamental frequency, which make it indeed hard to extract the accurate VTTF. To address this problem, I propose a statistical approach to the offline VTTF estimation based on a factor analyzed trajectory hidden Markov model that effectively models harmonic components observed over an utterance. This model is trained so that its likelihood for the observed harmonic component sequences is maximized while considering VTTF parameters as hidden variables. The trained model enables the maximum a posteriori (MAP) estimation of a time-varying VTTF sequence considering not only harmonic components at each analyzed frame but also those at other frames to interpolate the missing frequency components in a probabilistic manner. The effectiveness of the proposed method is demonstrated by a result of a simulation experiment.
27th May 2008 Jim Hieronymus (NASA Ames Research Center) Spoken Dialogue Systems for Space and Lunar Exploration Building spoken dialogue systems for space applications requires systems which are flexible, portable to new applications, robust to noise and able to discriminate between speech intended for the system and conversations with other astronauts and systems. Our systems are built to be flexible by using general typed unification grammars for the language models which can be specialized using example data. These are designed so that most sensible ways of expressing a request are correctly recognized semantically. The language models are tuned with extensive user feedback and data if available. The International Space Station and the EVA Suits are noisy (76 and 70 dB SPL). This noise is best minimized by using active noise canceling microphones which permit accurate speech recognition. Finally open microphone speech recognition is important to hands free, always available operation. Out of domain utterance rejection in its most simple form depends on careful adjustment of rejection thresholds for both acoustic and natural language scores so that out of domain rejection is near 97 % and the false rejection rate is around 5 %. This means that astronauts can talk to each other and by radio to the ground without the system falsely recognizing a command or query. The effect of statistical and linguistically motivated language modeling techniques will be discussed and shown to be of comparable performance. A short clip of the surface suit spoken dialogue system being used in a field test will be shown.
2nd June 2008 Filip Jurcicek Extended HVS Parser In the talk, I will present several extensions to the HVS parser. First, the initialization of its parameters was modified using automatically extracted negative examples. Second, the HVS parser was extended so that it is able to produce not only right branching parse trees but also the left branching parse trees. Finally, the third modification enables the parser to process not only words on its input but also additional features. The automatically obtained lemmas and morphological tags were used as features and they significantly increased the performance. Because the original parser and the extended parser were implemented in GMTK (the Graphical Models Toolkit), a brief description of the implementation will be given.
POSTPONED Trung Bui (Twente) TBC TBC