OPTIMIZATION SCHEMES FOR NEURAL NETWORKS
T.T. Jervis and W.J. Fitzgerald
24th August 1993
Training neural networks need not be a slow, computationally expensive process. The reason it is seen as such might be the traditional emphasis on gradient descent for optimization.
Conjugate gradient descent is an efficient optimization scheme for the weights of neural networks. This work includes an improvement to conjugate gradient descent that avoids line searches along the conjugate search directions. It makes use of a variant of backprop [Rumelhart86], called rbackprop [Pearlmutter93], which can calculate the product of the Hessian of the weights and an arbitrary vector. The calculation is exact and computationally cheap.
The report is in the nature of a tutorial. Gradient descent is reviewed and the back-propagation algorithm, used to find the gradients, is derived. Then a number of alternative optimization strategies are described:
Conjugate gradient descent Scaled conjugate gradient descent Delta-bar-delta RProp Quickprop
All six optimization schemes are tested on various tasks and various types of networks. The results show that scaled conjugate gradient descent and quickprop are expedient optimization schemes for a variety of problems.
If you have difficulty viewing files that end '.gz'
,
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
web site.
If you have difficulty viewing files that are in PostScript, (ending
'.ps'
or '.ps.gz'
), then you may be able to
find tools to view them at
the gsview
web site.
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.