A Study of Gradient-Based Algorithms

University essay from Lunds universitet/Matematisk statistik

Author: Rasmus Hallén; [2017]

Keywords: Mathematics and Statistics;

Abstract: Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function. In the age of web-scale prediction problems, many venerable algorithms may encounter difficulties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subfield called Machine Learning, it is possible to produce self-adapting optimizers able to deal with different challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand. This thesis shall compare the performance of two different gradient-based algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a mini-batch. When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number. In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the mini-batch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into mini-batches, SGD achieves 70.49% success rate in a more resonable timeframe.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)