Reinforcement learning for train dispatching : A study on the possibility to use reinforcement learning to optimize train ordering and minimize train delays in disrupted situations, inside the r ail simulator OSRD

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Teodora Popescu; [2022]

Keywords: ;

Abstract: Train dispatching is a complex process, especially when the train traffic is disrupted, as the decisions taken by the dispatchers can have substantial consequences on the delays of the trains. The most frequent dispatching decisions consists in changing the order of trains at convergence points, where two tracks unite to become a single track. Choosing the right train order is crucial, as the trains cannot bypass each other again while they are on the single track after the convergence point. The OSRD team of SNCF R´eseau has designed the rail simulator OSRD (Open Source Railway Designer), which can simulate any traffic situation. The goal of this degree project was to study if reinforcement learning could be implemented in that simulator to find optimal ordering policies under traffic disruptions. A thorough literature review was carried out to identify what reinforcement learning models have already been used in the literature to handle similar problems. None of the models seen in the literature could directly be adapted to the OSRD simulator but key features which seemed to be necessary to build an efficient reinforcement learning model in OSRD were determined. Based on those features and on the specificities of OSRD, a custom reinforcement learning model (states, actions, rewards) was created. This model was then implemented into a Python reinforcement learning environment after designing an interactive simulation module which enabled communication between the Python reinforcement learning environment and OSRD. After ensuring that the model was running and enabled interacting with an OSRD simulation to retrieve decisions from it and take decisions which modified the train order, the study focused on what reinforcement learning algorithms could be used to implement a reinforcement learning algorithm which learns based on the implemented reinforcement learning model. Another in-depth literature review was performed on the existing reinforcement learning algorithms, and it was concluded that the most suitable algorithms for the project would be a policy gradient algorithm like REINFORCE and an evolutionary algorithm like the cross-entropy method. Both algorithms were then implemented but only the cross-entropy method achieved results. It was found out that the cross-entropy method converges very fast to the FIFO (First In First Out) method which always lets the first train arrived pass the convergence. The FIFO method was then compared with the actual best policies for the 50 disrupted simulations used as train set, using two scoring methods to make the comparison. The conclusion was that the FIFO policy was to some extent similar to the best policies but it was the optimal policy for only half of the simulations, even if its relative difference with the scores achieved by the best policies was acceptable. The differences between the best policy and the FIFO policy were analyzed in detail to find where the differences lied and to understand the rules applied by the best policies. Finally, even if the result achieved with the cross-entropy method did not correspond to an optimal policy nor to a complicated policy, it was concluded that reinforcement learning may still be relevant with a more complex simulation setup. However, the method used in this degree project still needs to be improved in order to achieve solutions which are closer to the optimal ones.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)