Essays about: "reward policy"

Showing result 1 - 5 of 57 essays containing the words reward policy.

  1. 1. Optimal taxation by two-agent reinforcement learning

    University essay from Stockholms universitet/Institutionen för data- och systemvetenskap

    Author : Erik Lindau; [2023]
    Keywords : ;

    Abstract : An economy’s tax policy is one of the vital moments for, on the one hand, stimulating economic growth and labor, and, on the other hand gaining revenues from economic performance. A sufficient level of tax revenues is further important to keep up with governmental obligations and social welfare. READ MORE

  2. 2. Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning

    University essay from KTH/Skolan för industriell teknik och management (ITM)

    Author : Niu Xuezhi; [2023]
    Keywords : Quadruped Robots; Soft Robotics; Reinforcement Learning; Gait Control; Model-Based Control Optimization; Kvadrupedroboter; Mjukrobotik; Förstärkningsinlärning; Gångkontroll; Optimering av robotkontroll;

    Abstract : Quadruped robots offer distinct advantages in navigating challenging terrains due to their flexible and shock-absorbing characteristics. This flexibility allows them to adapt to uneven surfaces, enhancing their maneuverability. READ MORE

  3. 3. Scalable Reinforcement Learning for Formation Control with Collision Avoidance : Localized policy gradient algorithm with continuous state and action space

    University essay from KTH/Skolan för teknikvetenskap (SCI); KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Andreu Matoses Gimenez; [2023]
    Keywords : Control theory; Multi-agent systems; Distributed systems; Formation control; Collision avoidance; Reinforcement learning; Teoria de control; Sistemes multiagent; Sistemes distribuïts; Control de formació; Prevenció de col·lisions; Reinforcement Learning; Reglerteknik; Multi-agent system; Distribuerade system; formationskontroll; Kollisionsundvikande; Reinforcement learning; Teoría de control; Sistemas multiagente; Sistemas distribuidos; Control de formación; Prevención de colisiones; Reinforcement Learning;

    Abstract : In the last decades, significant theoretical advances have been made on the field of distributed mulit-agent control theory. One of the most common systems that can be modelled as multi-agent systems are the so called formation control problems, in which a network of mobile agents is controlled to move towards a desired final formation. READ MORE

  4. 4. Improving Behavior Trees that Use Reinforcement Learning with Control Barrier Functions : Modular, Learned, and Converging Control through Constraining a Learning Agent to Uphold Previously Achieved Sub Goals

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Jannik Wagner; [2023]
    Keywords : Behavior Trees; Reinforcement Learning; Control Barrier Functions; Robotics; Artificial Intelligence; Verhaltensbäume; Verstärkendes Lernen; Kontrollbarrierefunktionen; Robotik; Künstliche Intelligenz; Beteendeträd; Förstärkningsinlärning; Kontrollbarriärfunktioner; Robotik; Artificiell Intelligens;

    Abstract : This thesis investigates combining learning action nodes in behavior trees with control barrier functions based on the extended active constraint conditions of the nodes and whether the approach improves the performance, in terms of training time and policy quality, compared to a purely learning-based approach. Behavior trees combine several behaviors, called action nodes, into one behavior by switching between them based on the current state. READ MORE

  5. 5. Exploration-Exploitation Trade-off Approaches in Multi-Armed Bandit

    University essay from Uppsala universitet/Institutionen för informationsteknologi

    Author : Duc Huy Le; [2023]
    Keywords : ;

    Abstract : Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained significant attention due to numerous applications. In Multi-armed Bandit, an agent faces the central challenge of choosing exploitation of its belief to hopefully gain a high reward and exploration to improve its knowledge of the environment, and any good strategy has to efficiently balance between the two actions. READ MORE