Comparison of Second Order Optimization Algorithms in Neural Networks Applied on Large-Scale Problems

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Johanna Frost; Rafael Lavatt; [2020]

Keywords: ;

Abstract: This bachelor thesis compares the second order optimization algorithms K-FAC and L-BFGS to common ones of firstorder, Gradient Descent, Stochastic Gradient Descent, and Adam, applied on neural networks for image classification. Networks with different architecture and number of parameters have been implemented and tested on three different data sets of images. L-BFGS proved to perform well compared to the simpler first order algorithm Gradient Descent on the simplest data set, although it had a much higher standard deviation in the results. Because of the way L-BFGS was implemented, it could not be run on larger networks and data sets and was not considered after the firstcomparisons. K-FAC performed better than L-BFGS and could be compared to Adam, which is a more sophisticated first order algorithm. However, as the networks and data set got more complicated, K-FAC tended to overfit to a greater extent than Adam and Stochastic Gradient Descent, meaning that the network seemed to remember the images rather than learning how to classify them. Also, the computational time increased more for K-FAC than for the first order algorithms when scaling up the problem.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)