Tacit collusion with deep multi-agent reinforcement learning

University essay from Handelshögskolan i Stockholm/Institutionen för nationalekonomi

Abstract: Automatic pricing now attracts the attention of competition authorities following recent machine learning developments. In particular, previous research shows that the Q-learning algorithm can reach collusive outcomes despite receiving only minimal human intervention. This thesis extends the current body of knowledge by considering the addition of neural networks to the Q-learning algorithm which enables learning in more complicated and close to reality environments. A simulation is conducted where two deep Q-learning agents play against each other in a sequentially repeated price game with payoffs configured to resemble a social dilemma. The agents start without any prior knowledge, but are deployed with the objective to maximise a discounted profit function which is learned about through ongoing exploration. After 3.5 million repeated interactions, the agents are evaluated in a test mode. Four specifications are tested, a basic scenario, an increased action space, random demand, and vertical differentiation. In the basic scenario, I find that both agents learn to associate high prices with large profits, resulting in profits reaching 95% of a hypothetical monopolist's profit. However, the agents do not learn reciprocity, meaning that they are exploitable. For the other specifications, the increased complexity means the agents do not learn stable behaviour during the training period under consideration. The findings add to previous research by showing that the addition of neural networks is possible, which opens up the door for more realistic future applications.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)