AI Safe Exploration: Reinforced learning with a blocker in unsafe environments
Abstract: Artificial intelligence can be trained with a trial anderror based approach. In an environment where a catastrophecan not be accepted a human overseer can be used, but thismight lower the efficiency of the learning. The study includesimplementation of an artifact meant to replace the humanoverseer when training an AI in simulated unsafe environments.The results of testing the implemented blocker shows that it canbe used for avoiding catastrophes and finding a path to reachthe goal in 17 out of 18 runs. The single failed execution showsthat the implemented blocker is in need of improvement in termsof data efficiency. Shaping rewards solely to reduce number ofsteps and catastrophes for a reinforcement learning agent hasbeen done successfully to some degree, but further steps can betaken to lower the number of catastrophes and steps.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)