Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13

Reinforcement Learning Related Abstracts

13 A Supervised Goal Directed Algorithm in Economical Choice Behaviour: An Actor-Critic Approach

Authors: Keyvanl Yahya


This paper aims to find a algorithmic structure that affords to predict and explain economic choice behaviour particularly under uncertainty (random policies) by manipulating the prevalent Actor-Critic learning method that complies with the requirements we have been entrusted ever since the field of neuroeconomics dawned on us. Whilst skimming some basics of neuroeconomics that might be relevant to our discussion, we will try to outline some of the important works which have so far been done to simulate choice making processes. Concerning neurological findings that suggest the existence of two specific functions that are executed through Basal Ganglia all the way down to sub-cortical areas, namely 'rewards' and 'beliefs', we will offer a modified version of actor/critic algorithm to shed a light on the relation between these functions and most importantly resolve what is referred to as a challenge for actor-critic algorithms, that is lack of inheritance or hierarchy which avoids the system being evolved in continuous time tasks whence the convergence might not emerge.

Keywords: Decision Making, Reinforcement Learning, neuroeconomics, choice behaviour, actor-critic algorithm

Procedia PDF Downloads 278
12 Intelligent Adaptive Learning in a Changing Environment

Authors: G. Valentis, Q. Berthelot


Nowadays the trend to develop ever more intelligent and autonomous systems often takes its inspiration in the living beings on Earth. Some simple isolated systems are able, once brought together, to form a strong and reliable system. When trying to adapt the idea to man-made systems it is not possible to include in their program everything the system may encounter during its life cycle. It is, thus, necessary to make the system able to take decisions based on other criteria such as its past experience, i.e. to make the system learn on its own. However, at some point the acquired knowledge depends also on environment. So the question is: if system environment is modified, how could the system respond to it quickly and appropriately enough? Here, starting from reinforcement learning to rate its decisions, and using adaptive learning algorithms for gain and loss reward, the system is made able to respond to changing environment and to adapt its knowledge as time passes. Application is made to a robot finding an exit in a labyrinth.

Keywords: Neural Network, Autonomous Systems, Reinforcement Learning, Adaptive Learning, changing environment

Procedia PDF Downloads 281
11 A Novel Exploration/Exploitation Policy Accelerating Learning In Both Stationary And Non Stationary Environment Navigation Tasks

Authors: Wiem Zemzem, Moncef Tagina


In this work, we are addressing the problem of an autonomous mobile robot navigating in a large, unknown and dynamic environment using reinforcement learning abilities. This problem is principally related to the exploration/exploitation dilemma, especially the need to find a solution letting the robot detect the environmental change and also learn in order to adapt to the new environmental form without ignoring knowledge already acquired. Firstly, a new action selection strategy, called ε-greedy-MPA (the ε-greedy policy favoring the most promising actions) is proposed. Unlike existing exploration/exploitation policies (EEPs) such as ε-greedy and Boltzmann, the new EEP doesn’t only rely on the information of the actual state but also uses those of the eventual next states. Secondly, as the environment is large, an exploration favoring least recently visited states is added to the proposed EEP in order to accelerate learning. Finally, various simulations with ball-catching problem have been conducted to evaluate the ε-greedy-MPA policy. The results of simulated experiments show that combining this policy with the Qlearning method is more effective and efficient compared with the ε-greedy policy in stationary environments and the utility-based reinforcement learning approach in non stationary environments.

Keywords: Reinforcement Learning, Dynamic Environment, autonomous mobile robot, exploration/ exploitation policy, large

Procedia PDF Downloads 291
10 Buffer Allocation and Traffic Shaping Policies Implemented in Routers Based on a New Adaptive Intelligent Multi Agent Approach

Authors: M. Taheri Tehrani, H. Ajorloo


In this paper, an intelligent multi-agent framework is developed for each router in which agents have two vital functionalities, traffic shaping and buffer allocation and are positioned in the ports of the routers. With traffic shaping functionality agents shape the traffic forward by dynamic and real time allocation of the rate of generation of tokens in a Token Bucket algorithm and with buffer allocation functionality agents share their buffer capacity between each other based on their need and the conditions of the network. This dynamic and intelligent framework gives this opportunity to some ports to work better under burst and more busy conditions. These agents work intelligently based on Reinforcement Learning (RL) algorithm and will consider effective parameters in their decision process. As RL have limitation considering much parameter in its decision process due to the volume of calculations, we utilize our novel method which invokes Principle Component Analysis (PCA) on the RL and gives a high dimensional ability to this algorithm to consider as much as needed parameters in its decision process. This implementation when is compared to our previous work where traffic shaping was done without any sharing and dynamic allocation of buffer size for each port, the lower packet drop in the whole network specifically in the source routers can be seen. These methods are implemented in our previous proposed intelligent simulation environment to be able to compare better the performance metrics. The results obtained from this simulation environment show an efficient and dynamic utilization of resources in terms of bandwidth and buffer capacities pre allocated to each port.

Keywords: Reinforcement Learning, Principal Component Analysis, buffer allocation, multi- agent systems

Procedia PDF Downloads 341
9 A Reinforcement Learning Approach for Evaluation of Real-Time Disaster Relief Demand and Network Condition

Authors: Ali Nadi, Ali Edrissi


Relief demand and transportation links availability is the essential information that is needed for every natural disaster operation. This information is not in hand once a disaster strikes. Relief demand and network condition has been evaluated based on prediction method in related works. Nevertheless, prediction seems to be over or under estimated due to uncertainties and may lead to a failure operation. Therefore, in this paper a stochastic programming model is proposed to evaluate real-time relief demand and network condition at the onset of a natural disaster. To address the time sensitivity of the emergency response, the proposed model uses reinforcement learning for optimization of the total relief assessment time. The proposed model is tested on a real size network problem. The simulation results indicate that the proposed model performs well in the case of collecting real-time information.

Keywords: Disaster Management, Reinforcement Learning, real-time demand, relief demand

Procedia PDF Downloads 148
8 Distributed Coverage Control by Robot Networks in Unknown Environments Using a Modified EM Algorithm

Authors: Mohammadhosein Hasanbeig, Lacra Pavel


In this paper, we study a distributed control algorithm for the problem of unknown area coverage by a network of robots. The coverage objective is to locate a set of targets in the area and to minimize the robots’ energy consumption. The robots have no prior knowledge about the location and also about the number of the targets in the area. One efficient approach that can be used to relax the robots’ lack of knowledge is to incorporate an auxiliary learning algorithm into the control scheme. A learning algorithm actually allows the robots to explore and study the unknown environment and to eventually overcome their lack of knowledge. The control algorithm itself is modeled based on game theory where the network of the robots use their collective information to play a non-cooperative potential game. The algorithm is tested via simulations to verify its performance and adaptability.

Keywords: Game theory, Reinforcement Learning, Distributed Control, Multi-Agent Learning

Procedia PDF Downloads 314
7 Analysis of Q-Learning on Artificial Neural Networks for Robot Control Using Live Video Feed

Authors: Nihal Murali, Kunal Gupta, Surekha Bhanot


Training of artificial neural networks (ANNs) using reinforcement learning (RL) techniques is being widely discussed in the robot learning literature. The high model complexity of ANNs along with the model-free nature of RL algorithms provides a desirable combination for many robotics applications. There is a huge need for algorithms that generalize using raw sensory inputs, such as vision, without any hand-engineered features or domain heuristics. In this paper, the standard control problem of line following robot was used as a test-bed, and an ANN controller for the robot was trained on images from a live video feed using Q-learning. A virtual agent was first trained in simulation environment and then deployed onto a robot’s hardware. The robot successfully learns to traverse a wide range of curves and displays excellent generalization ability. Qualitative analysis of the evolution of policies, performance and weights of the network provide insights into the nature and convergence of the learning algorithm.

Keywords: Artificial Neural Networks, Reinforcement Learning, Robot Learning, q-learning

Procedia PDF Downloads 236
6 One-Step Time Series Predictions with Recurrent Neural Networks

Authors: Vaidehi Iyer, Konstantin Borozdin


Time series prediction problems have many important practical applications, but are notoriously difficult for statistical modeling. Recently, machine learning methods have been attracted significant interest as a practical tool applied to a variety of problems, even though developments in this field tend to be semi-empirical. This paper explores application of Long Short Term Memory based Recurrent Neural Networks to the one-step prediction of time series for both trend and stochastic components. Two types of data are analyzed - daily stock prices, that are often considered to be a typical example of a random walk, - and weather patterns dominated by seasonal variations. Results from both analyses are compared, and reinforced learning framework is used to select more efficient between Recurrent Neural Networks and more traditional auto regression methods. It is shown that both methods are able to follow long-term trends and seasonal variations closely, but have difficulties with reproducing day-to-day variability. Future research directions and potential real world applications are briefly discussed.

Keywords: Reinforcement Learning, Recurrent Neural Networks, prediction methods, long short term memory

Procedia PDF Downloads 116
5 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak


We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: Reinforcement Learning, natural language inference, agent-based language learning, low-resource translation, neural machine translation

Procedia PDF Downloads 12
4 Comparative Analysis of Reinforcement Learning Algorithms for Autonomous Driving

Authors: Migena Mana, Ahmed Khalid Syed, Abdul Malik, Nikhil Cherian


In recent years, advancements in deep learning enabled researchers to tackle the problem of self-driving cars. Car companies use huge datasets to train their deep learning models to make autonomous cars a reality. However, this approach has certain drawbacks in that the state space of possible actions for a car is so huge that there cannot be a dataset for every possible road scenario. To overcome this problem, the concept of reinforcement learning (RL) is being investigated in this research. Since the problem of autonomous driving can be modeled in a simulation, it lends itself naturally to the domain of reinforcement learning. The advantage of this approach is that we can model different and complex road scenarios in a simulation without having to deploy in the real world. The autonomous agent can learn to drive by finding the optimal policy. This learned model can then be easily deployed in a real-world setting. In this project, we focus on three RL algorithms: Q-learning, Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). To model the environment, we have used TORCS (The Open Racing Car Simulator), which provides us with a strong foundation to test our model. The inputs to the algorithms are the sensor data provided by the simulator such as velocity, distance from side pavement, etc. The outcome of this research project is a comparative analysis of these algorithms. Based on the comparison, the PPO algorithm gives the best results. When using PPO algorithm, the reward is greater, and the acceleration, steering angle and braking are more stable compared to the other algorithms, which means that the agent learns to drive in a better and more efficient way in this case. Additionally, we have come up with a dataset taken from the training of the agent with DDPG and PPO algorithms. It contains all the steps of the agent during one full training in the form: (all input values, acceleration, steering angle, break, loss, reward). This study can serve as a base for further complex road scenarios. Furthermore, it can be enlarged in the field of computer vision, using the images to find the best policy.

Keywords: Reinforcement Learning, Autonomous Driving, DDPG (deep deterministic policy gradient), PPO (proximal policy optimization)

Procedia PDF Downloads 8
3 Metareasoning Image Optimization Q-Learning

Authors: Mahasa Zahirnia


The purpose of this paper is to explore new and effective ways of optimizing satellite images using artificial intelligence, and the process of implementing reinforcement learning to enhance the quality of data captured within the image. In our implementation of Bellman's Reinforcement Learning equations, associated state diagrams, and multi-stage image processing, we were able to enhance image quality, detect and define objects. Reinforcement learning is the differentiator in the area of artificial intelligence, and Q-Learning relies on trial and error to achieve its goals. The reward system that is embedded in Q-Learning allows the agent to self-evaluate its performance and decide on the best possible course of action based on the current and future environment. Results show that within a simulated environment, built on the images that are commercially available, the rate of detection was 40-90%. Reinforcement learning through Q-Learning algorithm is not just desired but required design criteria for image optimization and enhancements. The proposed methods presented are a cost effective method of resolving uncertainty of the data because reinforcement learning finds ideal policies to manage the process using a smaller sample of images.

Keywords: Reinforcement Learning, Markov decision process, q-learning, image optimization

Procedia PDF Downloads 1
2 Modern Scotland Yard: Improving Surveillance Policies Using Adversarial Agent-Based Modelling and Reinforcement Learning

Authors: Olaf Visker, Arnout De Vries, Lambert Schomaker


Predictive policing refers to the usage of analytical techniques to identify potential criminal activity. It has been widely implemented by various police departments. Being a relatively new area of research, there are, to the author’s knowledge, no absolute tried, and true methods and they still exhibit a variety of potential problems. One of those problems is closely related to the lack of understanding of how acting on these prediction influence crime itself. The goal of law enforcement is ultimately crime reduction. As such, a policy needs to be established that best facilitates this goal. This research aims to find such a policy by using adversarial agent-based modeling in combination with modern reinforcement learning techniques. It is presented here that a baseline model for both law enforcement and criminal agents and compare their performance to their respective reinforcement models. The experiments show that our smart law enforcement model is capable of reducing crime by making more deliberate choices regarding the locations of potential criminal activity. Furthermore, it is shown that the smart criminal model presents behavior consistent with popular crime theories and outperforms the baseline model in terms of crimes committed and time to capture. It does, however, still suffer from the difficulties of capturing long term rewards and learning how to handle multiple opposing goals.

Keywords: Reinforcement Learning, Agent Based Modelling, adversarial, predictive policing

Procedia PDF Downloads 1
1 Mutiple Medical Landmark Detection on X-Ray Scan Using Reinforcement Learning

Authors: Vijaya Yuvaram Singh V M, Kameshwar Rao J V


The challenge with development of neural network based methods for medical is the availability of data. Anatomical landmark detection in the medical domain is a process to find points on the x-ray scan report of the patient. Most of the time this task is done manually by trained professionals as it requires precision and domain knowledge. Traditionally object detection based methods are used for landmark detection. Here, we utilize reinforcement learning and query based method to train a single agent capable of detecting multiple landmarks. A deep Q network agent is trained to detect single and multiple landmarks present on hip and shoulder from x-ray scan of a patient. Here a single agent is trained to find multiple landmark making it superior to having individual agents per landmark. For the initial study, five images of different patients are used as the environment and tested the agents performance on two unseen images.

Keywords: Reinforcement Learning, deep neural network, medical landmark detection, multi target detection

Procedia PDF Downloads 1