The derivation of first- and second-order backpropagation methods for fully-connected and convolutional neural networks

University essay from Lunds universitet/Matematik LTH; Lunds universitet/Matematikcentrum

Author: Simon Sjögren; [2021]

Keywords: Mathematics and Statistics;

Abstract: We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fully-connected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar-10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the second-order method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)