Dopamine Waves Lead to a Swift and Adaptive Reinforcement Learning Algorithm

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Accumulating evidence suggests that dopaminergic neurons show significant task-related diversity. Curiously, dopamine concentration and dopamine axon activity show spatio-temporal wave patterns in the dorsal striatum. What could be the function of this wave-like dynamics of dopamine in the striatum, particularly in Reinforcement Learning? This work introduces a novel Reinforcement Learning algorithm that exploits the wave-like dynamics of dopamine to increase speed, reliability and flexibility in decision-making. An agent can form a cognitive map by exploring the environment and obtaining the information about the expectation of time spent in each future state given a departing state (i.e. the Successor Representation). This map captures the temporal connections of the visited states and outlines several possible state transition trajectories leading to the reward. Using the cognitive map, following a single reward delivery, the reward prediction errors can be computed for each state. In the cognitive map, states leading to the reward possess a high positive error, while temporally distant states retain smaller errors. Thus, the dynamics of errors exhibit a wave front travelling in the cognitive map. Under the assumption of the neurons representing adjacent states in the cognitive map are also spatial neighbors, it automatically follows that the reward prediction error carrying signal will also show wave-like dynamics in space. By exploiting the dopamine waves, the proposed Reinforcement Learning approach outperforms three classical Reinforcement Learning algorithms: basic SARSA, the Successor Representation and SARSA with eligibility traces. Consequently, the algorithm suggests conditions under which wave-like dynamics of dopamine release in the striatum can have direct functional implications for learning. 

