Evaluating the Practicality of Using a Kronecker-Factored Approximate Curvature Matrix in Newton's Method for Optimization in Neural Networks

University essay from KTH/Skolan för teknikvetenskap (SCI)

Author: Magnus Tornstad; [2020]

Keywords: ;

Abstract: For a long time, second-order optimization methods have been regarded as computationally inefficient and intractable for solving the optimization problem associated with deep learning. However, proposed in recent research is an adaptation of Newton's method for optimization in which the Hessian is approximated by a Kronecker-factored approximate curvature matrix, known as KFAC. This work aims to assess its practicality for use in deep learning. Benchmarks were performed using abstract, binary, classification problems, as well as the real-world Boston Housing regression problem, and both deep and shallow network architectures were employed. KFAC was found to offer great savings in computational complexity compared to a naive approximate second-order implementation using the Gauss Newton matrix. Comparing performance in deep and shallow networks, the loss convergence of both stochastic gradient descent (SGD) and KFAC showed a dependency upon network architecture, where KFAC tended to converge quicker in deep networks, and SGD tended to converge quicker in shallow networks. The study concludes that KFAC can perform well in deep learning, showing competitive loss minimization versus basic SGD, but that it can be sensitive to initial weigths. This sensitivity could be remedied by allowing the first steps to be taken by SGD, in order to set KFAC on a favorable trajectory.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)