Next: Summary of Variations Up: Decoding Previous: Search Algorithm

Pruning

Two basic pruning criteria are used to reduce the computation required in decoding. Likelihood based pruning is similar to the various types of pruning used in most decoders and is based on the acoustic model likelihoods. Posterior based pruning is specific to systems which employ a local posterior phone probability estimator.

Likelihood based methods are used to compute the envelope and also to set a maximum stack size. These rely on the computation of an estimate of the least upper bound of the log likelihood at time t, lub(t). This is an updated estimate and is equal to the log likelihood of the most probable partial hypothesis at time t. The size of the envelope is set heuristically and is dependent on the accuracy of the estimate of lub(t). The second pruning parameter is used to control the maximum number of hypotheses in a stack. This parameter may be regarded as adaptively tightening the envelope, while ensuring that M hypotheses are still extended at each time (subject to the overall envelope).

A second pruning method has been developed to take advantage of the connectionist probability estimator used in the hybrid system. The phone posteriors may be regarded as a local estimate of the presence of a phone at a particular time frame. If the posterior probability estimate of a phone given a frame of acoustic data is below a threshold, then all words containing that phone at that time frame may be pruned. This may be efficiently achieved using a tree organisation of the pronunciation dictionary. This process is referred to as phone deactivation pruning. The posterior probability threshold used to make the pruning decision may be empirically determined in advance using a development set and is constant for all phones.

This posterior-based approach is similar to the likelihood-based channel-bank approach of Gopalakrishnan et al. [34], which used phone-dependent thresholds. However, that system incurred a 5--10% relative search error to obtain a factor of two speedup on large vocabulary task. This new approach is extremely effective. On a 20K trigram Wall Street Journal task, phone deactivation pruning can result in close to an order of magnitude faster decoding, with less than 2% relative search error (see table 2).

  
Table 2: Decoding performance on the Wall Street Journal task using a 20,000 word vocabulary and a trigram language model. Accuracy and CPU time (in multiples of realtime on an HP735) are given with respect to varying the likelihood envelope and the posterior-based phone deactivation pruning threshold. The maximum stack size was set to be 31.

Next: Summary of Variations Up: Decoding Previous: Search Algorithm


Tony Robinson Sun Jun 4 20:04:56 BST 1995