A Study of Time-Stepping Methods for Optimization in Supervised Learning

University essay from Lunds universitet/Matematik LTH; Lunds universitet/Matematikcentrum

Abstract: In supervised machine learning, training neural networks requires solving optimization problems. These problems can be restated as gradient flow problems and solved numerically using time-stepping methods. In order to study the behavior of these time-stepping methods, in the context of solving the aforementioned optimization problems, a framework in Python was developed. This framework is based on the software for machine learning Scikit Learn and implements a stochastic approach of different time-stepping methods, namely the explicit Euler method, the implicit Euler method and the classical fourth-order Runge-Kutta method. This framework was used to run numerical tests in order to study the behaviour of these methods when used to train neural networks on a small scale data set, as well as the MNIST data set. These tests revealed that the performance of the explicit Euler method and the Runge-Kutta method is comparable, with the explicit Euler method being slightly better at minimizing the loss function in terms of computation time. On the other hand, the implicit Euler method was found to be impractical, especially when the training data set is high dimensional. This is due to the fact that this method requires the solving of computationally expensive nonlinear equations at every iteration. Finally, a convergence analysis was done for the explicit Euler method and the classical fourth-order Runge-Kutta method to show that they converge (in expectation) to a neighborhood of the optimal solution.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)