Improving Behavior Trees that Use Reinforcement Learning with Control Barrier Functions : Modular, Learned, and Converging Control through Constraining a Learning Agent to Uphold Previously Achieved Sub Goals

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: This thesis investigates combining learning action nodes in behavior trees with control barrier functions based on the extended active constraint conditions of the nodes and whether the approach improves the performance, in terms of training time and policy quality, compared to a purely learning-based approach. Behavior trees combine several behaviors, called action nodes, into one behavior by switching between them based on the current state. Those behaviors can be hand-coded or learned in so-called learning action nodes. In these nodes, the behavior is a reinforcement learning agent. Behavior trees can be constructed in a process called backward chaining. In order to ensure the success of a backward-chained behavior tree, each action node must uphold previously achieved subgoals. So-called extended active constraint conditions formalize this notion as conditions that must stay true for the action node to continue execution. In order to incentivize upholding extended active constraint conditions in learning action nodes, a negative reward can be given to the agent upon violating extended active constraint conditions. However, this approach does not guarantee not violating the extended active constraint conditions since it is purely learning-based. Control barrier functions can be used to restrict the actions available to an agent so that it stays within a safe subset of the state space. By defining the safe subset of the state space as the set in which the extended active constraint conditions are satisfied, control barrier functions can be employed to, ideally, guarantee that the extended active constraint conditions will not be violated. The results show that significantly less training is needed to get comparable, or slightly better, results, when compared to not using control barrier functions. Furthermore, extended active constraint conditions are considerably less frequently violated and the overall performance is slightly improved.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)