An efficient deep reinforcement learning approach to the energy management for a parallel hybrid electric vehicle

University essay from KTH/Skolan för industriell teknik och management (ITM)

Abstract: In contemporary world, the global energy crisis and raise of greenhouse gas concentration in atmosphere necessitate the energy conservation and emission reduction. Hybrid electric vehicles (HEVs) can achieve great promise in reducing fuel consumption and greenhouse gas emissions by appropriate energy management strategies (EMSs). Considering that the actual driving environment is varying due to different road conditions, weathers and so on, this thesis aims to propose a rapidly convergent reinforcement learning based method to design an EMS with strong self-adaptivity. Based on a parallel HEV prototype, Q-learning and Deep Neural Network (QL-DNN) method and Deep Q Network (DQN) method are proposed to design the EMS. To improve learning efficiency, Dynamic Programming (DP) is applied offline to solve an optimal control problem to obtain a cost-to-go matrix, which is further expanded to a Q-table to initialize the learning agents. The QL-DNN method conducts Q-learning (QL) based on the Q-table generated by DP to obtain a trained Q-table, which is then converted to a neural network to initialize an DQN agent for deep Q-learning. By contrast, QL is not required by the DQN method. The DQN method directly converts the Q-table generated by DP to a neural network to initialize the DQN agent for deep Q-learning. Based on a given racing track, in comparison with the original quasi-Pontryagin’s Minimum Principle (Q-PMP) based EMS, the QL-DNN based EMS achieves a fuel efficiency with 12% deviation, and the DQN based EMS achieves a fuel efficiency with 10% deviation. Besides, the learning frameworks make the proposed EMSs learn from the environment in real-time, so that they can adapt to varied driving environment with smaller deviations on fuel efficiencies than the Q-PMP based EMS. Although the number of training episodes required by DQN is 267, which is almost one sixth of the number of episodes required by QL-DNN, the total training time required by DQN is 53369s, which is 59.9% longer than QL-DNN. This is because that most of the episodes of QL-DNN are spent on the QL phase, which updates the Q-table point by point. Each update of the Q-table requires much less time than each update of DQN, which requires a minibatch of data to update network parameters. In conclusion, QL-DNN is a more efficient method than DQN to design an EMS.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)