Search results for: TRPO

2 Reinforcement Learning for Robust Missile Autopilot Design: TRPO Enhanced by Schedule Experience Replay

Authors: Bernardo Cortez, Florian Peter, Thomas Lausenhammer, Paulo Oliveira

Abstract:

Designing missiles’ autopilot controllers have been a complex task, given the extensive flight envelope and the nonlinear flight dynamics. A solution that can excel both in nominal performance and in robustness to uncertainties is still to be found. While Control Theory often debouches into parameters’ scheduling procedures, Reinforcement Learning has presented interesting results in ever more complex tasks, going from videogames to robotic tasks with continuous action domains. However, it still lacks clearer insights on how to find adequate reward functions and exploration strategies. To the best of our knowledge, this work is a pioneer in proposing Reinforcement Learning as a framework for flight control. In fact, it aims at training a model-free agent that can control the longitudinal non-linear flight dynamics of a missile, achieving the target performance and robustness to uncertainties. To that end, under TRPO’s methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance. Not only does this work enhance the concept of prioritized experience replay into BPER, but it also reformulates HER, activating them both only when the training progress converges to suboptimal policies, in what is proposed as the SER methodology. The results show that it is possible both to achieve the target performance and to improve the agent’s robustness to uncertainties (with low damage on nominal performance) by further training it in non-nominal environments, therefore validating the proposed approach and encouraging future research in this field.

Keywords: Reinforcement Learning, flight control, HER, missile autopilot, TRPO

Procedia PDF Downloads 263

1 Robot Movement Using the Trust Region Policy Optimization

Authors: Romisaa Ali

Abstract:

The Policy Gradient approach is one of the deep reinforcement learning families that combines deep neural networks (DNN) with reinforcement learning RL to discover the optimum of the control problem through experience gained from the interaction between the robot and its surroundings. In contrast to earlier policy gradient algorithms, which were unable to handle these two types of error because of over-or under-estimation introduced by the deep neural network model, this article will discuss the state-of-the-art SOTA policy gradient technique, trust region policy optimization (TRPO), by applying this method in various environments compared to another policy gradient method, the Proximal Policy Optimization (PPO), to explain their robust optimization, using this SOTA to gather experience data during various training phases after observing the impact of hyper-parameters on neural network performance.

Keywords: deep neural networks, deep reinforcement learning, proximal policy optimization, state-of-the-art, trust region policy optimization

Procedia PDF Downloads 168