Bandit Algorithms for Adaptive Modulation and Coding in Wireless Networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Romain Deffayet; [2020]

Keywords: ;

Abstract: The demand for quality cellular network coverage has been increasing significantly in the recent years and will continue its progression throughout the near future. This results from an increase of transmitted data, because of new use cases (HD videos, live streaming, online games, ...), but also from a diversification of the traffic, notably because of shorter and more frequent transmissions which can be due to IOT devices or other telemetry applications. The cellular networks are becoming increasingly complex, and the need for better management of the network’s properties is higher than ever. The combined effect of these two paradigms creates a trade-off : whereas one would like to design algorithms that achieve high performance decision-making, one would also like those to be able to do so in any settings that can be encountered in this complex network. Instead, this thesis proposes to restrict the scope of the decision-making algorithms through on-line learning. The thesis focuses on the context of initial MCS selection in Adaptive Modulation and Coding, in which one must choose an initial transmission rate guaranteeing fast communications and low error rate. We formulate the problem as a Reinforcement Learning problem, and propose relevant restrictions to simpler frameworks like Multi-Armed Bandits and Contextual Bandits. Eight bandit algorithms are tested and reviewed with emphasis on practical applications. The thesis shows that a Reinforcement Learning agent can improve the utilization of the link capacity between the transmitter and the receiver. First, we present a cell-wide Multi-Armed Bandit agent, which learns the optimal initial offset in a given cell, and then a contextual augmentation of this agent taking user-specific features as input. The proposed method achieves with burst traffic an 8% increase of the median throughput and 65% reduction of the median regret in the first 0:5s of transmission, when compared to a fixed baseline.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)