Safe Reinforcement Learning for Human-Robot Collaboration : Shielding of a Robotic Local Planner in an Autonomous Warehouse Scenario

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Reinforcement Learning (RL) is popular to solve complex tasks in robotics, but using it in scenarios where humans collaborate closely with robots can lead to hazardous situations. In an autonomous warehouse, mobile robotic units share the workspace with human workers which can lead to collisions, because the positions of humans or non-static obstacles are not known by the robot. Such a scenario requires the robot to use some form of visual input from a lidar sensor or RGB camera, to learn how to adjusts its velocity commands to keep a safe distance and reduced speed when approaching obstacles. This is essential to train an RL-based robotic controller to be safe, however, it does not address the issue to make training itself safer, which in foresight is crucial to enable real-world training. This thesis proposes an agent setup with modified reward structure to train a local planner for a Turtlebot robot with lidar sensor that satisfies safety while maximizing the RL reward. Additionally, it presents a shielding approach that can intervene on a complex controller, by using a safe, sub-optimal backup policy in case the agent enters unsafe states. Two agents, an unshielded agent and one with shielding, are trained with this method in a simulated autonomous warehouse to investigate the effects of shielding during training. For evaluation we compare four conditions: Both agents are deployed once with activated shield and once without it. Those four conditions are analysed in regards to safety and efficiency. Finally, a comparison to the performance of the baseline Trajectory Planner is conducted. The results show that shielding during training facilitates task completion and reduces collisions by 25% compared to the unshielded agent. On the other hand, unshielded training yields better safety results during deployment. Generally, an active shield during deployment contributes to efficiency of the agent, independent of the training setup. The system design is integrated into the Robot Operating System (ROS) where its modular design makes the method compatible with different (RL) algorithms and deployable in OpenAI gym environments.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)