Learning Sampling Strategies for Stochastic Gradient Descent using Deep Reinforcement Learning techniques

University essay from Lunds universitet/Institutionen för reglerteknik

Author: Hampus Rosvall; [2020]

Keywords: Technology and Engineering;

Abstract: Solving finite-sum minimization problems could be done by the use of a gradient descent algorithm. The algorithm evaluates the gradient with respect to the current state of the parameters and updates the parameters in the direction of the steepest descent. For certain problems, evaluating the full gradient at every iteration is computationally heavy and the gradient of the objective function is thus approximated by evaluating the gradient with respect to one of the summands given in the finitesum. This method is referred to as stochastic gradient descent, where the gradient approximator summand is chosen with respect to a sampling strategy. The purpose of this thesis is to learn such a sampling strategy by the use of deep reinforcement learning namely, deep Q-learning. Reinforcement learning is a field within machine learning where software, an agent, interacts with an environment in order to learn how to act optimally over time. The agent performs actions which affect the state of the environment, and observes a real-valued feedback, a reward, based on the action. In order to not explicitly store the expected outcome of each (state, action)-pair in memory, the focus is turned towards deep learning in order to approximate the expected outcome. The finite sum objective investigated were least squares, where the Lipschitz constants of the summands were of different order of magnitude. The results showed that the sampling strategy generated by the deep Q-network performed similarly to stochastic gradient descent using uniform and Lipschitz sampling. One experiment showed that the deep Q-learning agent could converge faster towards the analytical solution of the least squares objective, when trained for a fewer iterations per episode. The work could be extended further by constructing objective functions where uniform and Lipschitz sampling perform differently.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)