Abstract for holden_tr161

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR161 + PhD Thesis

ON THE THEORY OF GENERALIZATION AND SELF-STRUCTURING IN LINEARLY WEIGHTED CONNECTIONIST NETWORKS.

Sean Holden

September 1993

The study of connectionist networks has often been criticized for an overall lack of rigour, and for being based on excessively ad hoc techniques. Even though connectionist networks have now been the subject of several decades of study, the available body of research is characterized by the existence of a significant body of experimental results, and a large number of different techniques, with relatively little supporting, explanatory theory. This dissertation addresses the theory of generalization performance and architecture selection for a specific class of connectionist networks; a subsidiary aim is to compare these networks with the well-known class of multilayer perceptrons.

After discussing in general terms the motivation for our study, we introduce and review the class of networks of interest, which we call Phi-networks, along with the relevant supervised training algorithms. In particular, we argue that Phi-networks can in general be trained significantly faster than multilayer perceptrons, and we demonstrate that many standard networks are specific examples of Phi-networks.

Chapters 3, 4 and 5 consider generalization performance by presenting an analysis based on tools from computational learning theory. In chapter 3 we introduce and review the theoretical apparatus required, which is drawn from Probably Approximately Correct (PAC) learning theory. In chapter 4 we investigate the growth function and VC dimension for general and specific Phi-networks, obtaining several new results. We also introduce a technique which allows us to use the relevant PAC learning formalism to gain some insight into the effect of training algorithms which adapt architecture as well as weights (we call these self-structuring training algorithms). We then use our results to provide a theoretical explanation for the observation that Phi-networks can in practice require a relatively large number of weights when compared with multilayer perceptrons. In chapter 5 we derive new necessary and sufficient conditions on the number of training examples required when training a Phi-network such that we can expect a particular generalization performance. We compare our results with those derived elsewhere for feedforward networks of Linear Threshold Elements, and we extend one of our results to take into account the effect of using a self-structuring training algorithm.

In chapter 6 we consider in detail the problem of designing a good self-structuring training algorithm for Phi-networks. We discuss the best way in which to define an optimum architecture, and we then use various ideas from linear algebra to derive an algorithm, which we test experimentally. Our initial analysis allows us to show that the well-known weight decay approach to self-structuring is not guaranteed to provide a network which has an architecture close to the optimum one. We also extend our theoretical work in order to provide a basis for the derivation of an improved version of our algorithm.

Finally, chapter 7 provides conclusions and suggestions for future research.


(ftp:) holden_tr161.ps.Z (http:) holden_tr161.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) holden_tr161.pdf | (http:) holden_tr161.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.