Essays about: "reward policy"
Showing result 1 - 5 of 57 essays containing the words reward policy.
-
1. Optimal taxation by two-agent reinforcement learning
University essay from Stockholms universitet/Institutionen för data- och systemvetenskapAbstract : An economy’s tax policy is one of the vital moments for, on the one hand, stimulating economic growth and labor, and, on the other hand gaining revenues from economic performance. A sufficient level of tax revenues is further important to keep up with governmental obligations and social welfare. READ MORE
-
2. Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning
University essay from KTH/Skolan för industriell teknik och management (ITM)Abstract : Quadruped robots offer distinct advantages in navigating challenging terrains due to their flexible and shock-absorbing characteristics. This flexibility allows them to adapt to uneven surfaces, enhancing their maneuverability. READ MORE
-
3. Scalable Reinforcement Learning for Formation Control with Collision Avoidance : Localized policy gradient algorithm with continuous state and action space
University essay from KTH/Skolan för teknikvetenskap (SCI); KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : In the last decades, significant theoretical advances have been made on the field of distributed mulit-agent control theory. One of the most common systems that can be modelled as multi-agent systems are the so called formation control problems, in which a network of mobile agents is controlled to move towards a desired final formation. READ MORE
-
4. Improving Behavior Trees that Use Reinforcement Learning with Control Barrier Functions : Modular, Learned, and Converging Control through Constraining a Learning Agent to Uphold Previously Achieved Sub Goals
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : This thesis investigates combining learning action nodes in behavior trees with control barrier functions based on the extended active constraint conditions of the nodes and whether the approach improves the performance, in terms of training time and policy quality, compared to a purely learning-based approach. Behavior trees combine several behaviors, called action nodes, into one behavior by switching between them based on the current state. READ MORE
-
5. Exploration-Exploitation Trade-off Approaches in Multi-Armed Bandit
University essay from Uppsala universitet/Institutionen för informationsteknologiAbstract : Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained significant attention due to numerous applications. In Multi-armed Bandit, an agent faces the central challenge of choosing exploitation of its belief to hopefully gain a high reward and exploration to improve its knowledge of the environment, and any good strategy has to efficiently balance between the two actions. READ MORE