Cluster KL-UCB: Optimism for the Best, Pessimism for the Rest

University essay from Göteborgs universitet/Institutionen för matematiska vetenskaper

Author: Emelie Lööf; [2022-06-28]

Keywords: ;

Abstract: The project presents an allocation strategy for the stochastic multi armed bandit when considering instances with a clustered structure. Using the architecture of the KL-UCB policy as a source of inspiration, an algorithm which exploits and takes advantage from a clustered structure is derived. Firstly, encouraged by previous work related to the subject, a multi-level structure approach will constitute as an initial examination. Secondly, the Cluster KL-UCB policy will be derived and evaluated considering three di erent approaches. It will be shown, both theoretically and empirically, that adapting to a clustered environment improves the performance compared to its non cluster-adapting ancestor. Both upper and lower bounds on the regret will be provided in order to theoretically ensure the performance of the algorithm. Lastly, a number of empirical experiments will be performed in order to further ensure the performance and validate the theoretical results.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)