Learning a Reactive Task Plan from Human Demonstrations : Building Behavior Trees using Learning from Demonstration and Planning Constraints

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Robot programming can be an expensive and tedious task and companies may have to employ dedicated staff. A promising framework that can alleviate some of the most repetitive tasks and potentially make robots more accessible to non-experts is Learning from Demonstration (LfD). LfD is a framework where the robot learns how to solve a task by observing a human demonstrating it. A representation of the learned policy is needed and Behavior Trees (BTs) are promising. They are a representation of a controller that organizes the switching between tasks and naturally provides the modularity required for learning and the reactivity required for operating in an uncertain environment. Furthermore, BTs are transparent, allowing the user to inspect the policy and verify its safety before executing it. Learning BTs from demonstration has not been studied much in the past. The aim of this thesis is therefore to investigate the feasibility of using BTs in the context of LfD and how such a structure could be learned. To evaluate the feasibility of BTs and answering how they can be learned, a new algorithm for learning BTs from demonstration is presented and evaluated. The algorithm detects similarities between multiple demonstrations to infer in what reference frames different parts of a task occur. The similarities are also used to detect hidden task constraints and goal conditions that are given to a planner that outputs a reactive task plan in the form of a BT. The algorithm is evaluated on manipulation tasks in both simulation and a real robot. The results show that the resulting BT can successfully solve the task while being robust to initial conditions and reactive towards disturbances. These results suggest that BTs are a suitable policy representation for LfD. Furthermore, the results suggest that the presented algorithm is capable of learning a reactive and fault-tolerant task plan and can be used as a basis for future algorithms. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)