Causal Reinforcement Learning for Bandits with Unobserved Confounders

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Mingwei Deng; [2023]

Keywords: ;

Abstract: Reinforcement Learning (RL) has been recognized as a valuable tool in various fields. However, its application is limited by its reliance on extensive data through a trial-and-error approach and challenges in generalizing learned policies. Transfer learning has been proposed as a method to improve RL by reusing data collected from one domain to a new domain. Nevertheless, when knowledge is transferred inappropriately, negative transfer can happen, which can lead to poor performance in the target domain. Causal inference helps infer the effect of actions based on observational data and underlying causal structures. This thesis explores the use of causal inference to guide knowledge transfer in RL, aiming to enable transfer learning without negative transfer. The study focuses on the multi-armed bandit (MAB) problem, where confounders, variables affecting both action and outcome, are present. The limitations of traditional RL methods become evident, especially when confounders are not directly observed. To address this, we introduce two algorithms grounded in “transportability”, anotion from causal inference. The first algorithm, designed for a two-armed Bernoulli bandit withan unobserved binary confounder, calculates the causal effect of actions analytically. The second algorithm is formulated for bandits with low-dimensional unobserved confounders and high-dimensional observed proxy variables and uses a β-variational autoencoder (β-VAE) to estimate the causal effects of actions. Simulation results confirm that both algorithms effectively avoid negative transfer and perform better than standard RL methods. Notably, the first algorithm demonstrates robustness across different settings, in contrast to traditional RL methods which often experience negative transfer.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)