An Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural Networks

N. M. Nawi; M. R. Ransing; R. S. Ransing

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

An Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural Networks

Authors: N. M. Nawi, M. R. Ransing, R. S. Ransing

Abstract:

The conjugate gradient optimization algorithm usually used for nonlinear least squares is presented and is combined with the modified back propagation algorithm yielding a new fast training multilayer perceptron (MLP) algorithm (CGFR/AG). The approaches presented in the paper consist of three steps: (1) Modification on standard back propagation algorithm by introducing gain variation term of the activation function, (2) Calculating the gradient descent on error with respect to the weights and gains values and (3) the determination of the new search direction by exploiting the information calculated by gradient descent in step (2) as well as the previous search direction. The proposed method improved the training efficiency of back propagation algorithm by adaptively modifying the initial search direction. Performance of the proposed method is demonstrated by comparing to the conjugate gradient algorithm from neural network toolbox for the chosen benchmark. The results show that the number of iterations required by the proposed method to converge is less than 20% of what is required by the standard conjugate gradient and neural network toolbox algorithm.

Keywords: Back-propagation, activation function, conjugategradient, search direction, gain variation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328444

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2788

References:

[1] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal representations by error propagation. in D.E. Rumelhart and J.L. McClelland (eds), Parallel Distributed Processing, 1986. 1: p. 318- 362.
[2] Marco Gori and Alberto Tesi, On the problem of local minima in back-propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992. 14(1): p. 76-86.
[3] E.K. Blum, Approximation of Boolean functions by sigmoidal networks: Part I: XOR and other two-variable functions. Neural Computation, 1989. 1(4): p. 532-540.
[4] A. van Ooyen and B. Nienhuis, Improving the convergence of the back-propagation algorithm. Neural Networks, 1992. 5: p. 465-471.
[5] M. Ahmad and F.M.A. Salam, Supervised learning using the cauchy energy function. International Conference on Fuzzy Logic and Neural Networks, 1992.
[6] Pravin Chandra and Yogesh Singh, An activation function adapting training algorithm for sigmoidal feedforward networks. Neurocomputing, 2004. 61: p. 429-437.
[7] R.A. Jacobs, Increased rates of convergence through learning rate adaptation. Neural Networks, 1988. 1: p. 295-307.
[8] M. K. Weir, A method for self-determination of adaptive learning rates in back propagation. Neural Networks, 1991. 4: p. 371-379.
[9] X. H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of backpropagation learning using optimized learning rate and momentum. Electronics Letters, 1993. 29(14): p. 1288-1289.
[10] Bishop C. M., Neural Networks for Pattern Recognition. 1995: Oxford University Press.
[11] R. Fletcher and M. J. D. Powell, A rapidly convergent descent method for nlinimization. British Computer J., 1963: p. 163-168.
[12] Fletcher R. and Reeves R. M., Function minimization by conjugate gradients. Comput. J., 1964. 7(2): p. 149-160.
[13] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[14] Huang H.Y., A unified approach to quadratically convergent algorithms for function minimization. J. Optim. Theory Appl., 1970. 5: p. 405-423.
[15] Thimm G., Moerland F., and Emile Fiesler, The Interchangeability of Learning Rate an Gain in Back propagation Neural Networks. Neural Computation, 1996. 8(2): p. 451-460.
[16] Holger R. M. and Graeme C. D., The Effect of Internal Parameters and Geometry on the Performance of Back-Propagation Neural Networks. Environmental Modeling and Software, 1998. 13(1): p. 193-209.
[17] Eom K. and Jung K., Performance Improvement of Back propagation algorithm by automatic activation function gain tuning using fuzzy logic. Neurocomputing, 2003. 50: p. 439-460.
[18] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning internal representations by back-propagation errors. Parallel Distributed Processing, 1986. 1 (Rumelhart D.E. et al. Eds.): p. 318-362.
[19] L. Prechelt, Proben1 - A set of Neural Network Bencmark Problems and Benchmarking Rules. Technical Report 21/94, 1994: p. 1-38.
[20] Fisher R.A., The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936. 7: p. 179 -188.
[21] Erik Hjelmas and P.W. Munro, A comment on parity problem. Technical Report, 1999: p. 1-7.
[22] Mangasarian O. L. and W.W. H., Cancer diagnosis via linear programming. SIAM News, 1990. 23(5): p. 1-18.
[23] Lutz Prechelt, ftp://ftp.ira.uka.de/pub/neuron/proben1.tar.gz. 1994.
[24] R. A. Fisher, ftp://ftp.ics.uci.edu/pub/machinelearningdatabases/ iris/iris.data. 1988.