A Comparison of Selected Optimization Methods for Neural Networks

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Ludvig Karlsson; Oskar Bonde; [2020]

Keywords: ;

Abstract: Which numerical methods are ideal for training a neural network? In this report four different optimization methods are analysed and compared to each other. First, the most basic method Stochastic Gradient Descent that steps in the negative gradients direction. We continue with a slightly more advanced algorithm called ADAM, often used in practice to train neural networks. Finally, we study two second order methods, the Conjugate Gradient Method which follows conjugate directions, and L-BFGS, a Quasi-Newton method which approximates the inverse of the Hessian matrix. The methods are tasked to solve a classification problem with hyperspheres acting as decision boundaries and multiple different network configurations are used. Our results indicate why first order methods are so commonly used today and that second order methods can be difficult to use effectively when the number of parameters are large.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)