Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 7783

Search results for: batch-constrained reinforcement learning

7783 Metareasoning Image Optimization Q-Learning

Abstract:

The purpose of this paper is to explore new and effective ways of optimizing satellite images using artificial intelligence, and the process of implementing reinforcement learning to enhance the quality of data captured within the image. In our implementation of Bellman's Reinforcement Learning equations, associated state diagrams, and multi-stage image processing, we were able to enhance image quality, detect and define objects. Reinforcement learning is the differentiator in the area of artificial intelligence, and Q-Learning relies on trial and error to achieve its goals. The reward system that is embedded in Q-Learning allows the agent to self-evaluate its performance and decide on the best possible course of action based on the current and future environment. Results show that within a simulated environment, built on the images that are commercially available, the rate of detection was 40-90%. Reinforcement learning through Q-Learning algorithm is not just desired but required design criteria for image optimization and enhancements. The proposed methods presented are a cost effective method of resolving uncertainty of the data because reinforcement learning finds ideal policies to manage the process using a smaller sample of images.

Keywords: Q-learning, image optimization, reinforcement learning, Markov decision process

Procedia PDF Downloads 215

7782 Q-Learning of Bee-Like Robots Through Obstacle Avoidance

Authors: Jawairia Rasheed

Abstract:

Modern robots are often used for search and rescue purpose. One of the key areas of interest in such cases is learning complex environments. One of the key methodologies for robots in such cases is reinforcement learning. In reinforcement learning robots learn to move the path to reach the goal while avoiding obstacles. Q-learning, one of the most advancement of reinforcement learning is used for making the robots to learn the path. Robots learn by interacting with the environment to reach the goal. In this paper simulation model of bee-like robots is implemented in NETLOGO. In the start the learning rate was less and it increased with the passage of time. The bees successfully learned to reach the goal while avoiding obstacles through Q-learning technique.

Keywords: reinforlearning of bee like robots for reaching the goalcement learning for randomly placed obstacles, obstacle avoidance through q-learning, q-learning for obstacle avoidance,

Procedia PDF Downloads 101

7781 The AI Arena: A Framework for Distributed Multi-Agent Reinforcement Learning

Authors: Edward W. Staley, Corban G. Rivera, Ashley J. Llorens

Abstract:

Advances in reinforcement learning (RL) have resulted in recent breakthroughs in the application of artificial intelligence (AI) across many different domains. An emerging landscape of development environments is making powerful RL techniques more accessible for a growing community of researchers. However, most existing frameworks do not directly address the problem of learning in complex operating environments, such as dense urban settings or defense-related scenarios, that incorporate distributed, heterogeneous teams of agents. To help enable AI research for this important class of applications, we introduce the AI Arena: a scalable framework with flexible abstractions for distributed multi-agent reinforcement learning. The AI Arena extends the OpenAI Gym interface to allow greater flexibility in learning control policies across multiple agents with heterogeneous learning strategies and localized views of the environment. To illustrate the utility of our framework, we present experimental results that demonstrate performance gains due to a distributed multi-agent learning approach over commonly-used RL techniques in several different learning environments.

Keywords: reinforcement learning, multi-agent, deep learning, artificial intelligence

Procedia PDF Downloads 157

7780 Curriculum-Based Multi-Agent Reinforcement Learning for Robotic Navigation

Authors: Hyeongbok Kim, Lingling Zhao, Xiaohong Su

Abstract:

Deep reinforcement learning has been applied to address various problems in robotics, such as autonomous driving and unmanned aerial vehicle. However, because of the sparse reward penalty for a collision with obstacles during the navigation mission, the agent fails to learn the optimal policy or requires a long time for convergence. Therefore, using obstacles and enemy agents, in this paper, we present a curriculum-based boost learning method to effectively train compound skills during multi-agent reinforcement learning. First, to enable the agents to solve challenging tasks, we gradually increased learning difficulties by adjusting reward shaping instead of constructing different learning environments. Then, in a benchmark environment with static obstacles and moving enemy agents, the experimental results showed that the proposed curriculum learning strategy enhanced cooperative navigation and compound collision avoidance skills in uncertain environments while improving learning efficiency.

Keywords: curriculum learning, hard exploration, multi-agent reinforcement learning, robotic navigation, sparse reward

Procedia PDF Downloads 92

7779 Deep Reinforcement Learning with Leonard-Ornstein Processes Based Recommender System

Authors: Khalil Bachiri, Ali Yahyaouy, Nicoleta Rogovschi

Abstract:

Improved user experience is a goal of contemporary recommender systems. Recommender systems are starting to incorporate reinforcement learning since it easily satisfies this goal of increasing a user’s reward every session. In this paper, we examine the most effective Reinforcement Learning agent tactics on the Movielens (1M) dataset, balancing precision and a variety of recommendations. The absence of variability in final predictions makes simplistic techniques, although able to optimize ranking quality criteria, worthless for consumers of the recommendation system. Utilizing the stochasticity of Leonard-Ornstein processes, our suggested strategy encourages the agent to investigate its surroundings. Research demonstrates that raising the NDCG (Discounted Cumulative Gain) and HR (HitRate) criterion without lowering the Ornstein-Uhlenbeck process drift coefficient enhances the diversity of suggestions.

Keywords: recommender systems, reinforcement learning, deep learning, DDPG, Leonard-Ornstein process

Procedia PDF Downloads 142

7778 Leveraging Deep Q Networks in Portfolio Optimization

Authors: Peng Liu

Abstract:

Deep Q networks (DQNs) represent a significant advancement in reinforcement learning, utilizing neural networks to approximate the optimal Q-value for guiding sequential decision processes. This paper presents a comprehensive introduction to reinforcement learning principles, delves into the mechanics of DQNs, and explores its application in portfolio optimization. By evaluating the performance of DQNs against traditional benchmark portfolios, we demonstrate its potential to enhance investment strategies. Our results underscore the advantages of DQNs in dynamically adjusting asset allocations, offering a robust portfolio management framework.

Keywords: deep reinforcement learning, deep Q networks, portfolio optimization, multi-period optimization

Procedia PDF Downloads 32

7777 Reinforcement Learning for Self Driving Racing Car Games

Authors: Adam Beaunoyer, Cory Beaunoyer, Mohammed Elmorsy, Hanan Saleh

Abstract:

This research aims to create a reinforcement learning agent capable of racing in challenging simulated environments with a low collision count. We present a reinforcement learning agent that can navigate challenging tracks using both a Deep Q-Network (DQN) and a Soft Actor-Critic (SAC) method. A challenging track includes curves, jumps, and varying road widths throughout. Using open-source code on Github, the environment used in this research is based on the 1995 racing game WipeOut. The proposed reinforcement learning agent can navigate challenging tracks rapidly while maintaining low racing completion time and collision count. The results show that the SAC model outperforms the DQN model by a large margin. We also propose an alternative multiple-car model that can navigate the track without colliding with other vehicles on the track. The SAC model is the basis for the multiple-car model, where it can complete the laps quicker than the single-car model but has a higher collision rate with the track wall.

Keywords: reinforcement learning, soft actor-critic, deep q-network, self-driving cars, artificial intelligence, gaming

Procedia PDF Downloads 46

7776 A Fully Interpretable Deep Reinforcement Learning-Based Motion Control for Legged Robots

Authors: Haodong Huang, Zida Zhao, Shilong Sun, Chiyao Li, Wenfu Xu

Abstract:

The control methods for legged robots based on deep reinforcement learning have seen widespread application; however, the inherent black-box nature of neural networks presents challenges in understanding the decision-making motives of the robots. To address this issue, we propose a fully interpretable deep reinforcement learning training method to elucidate the underlying principles of legged robot motion. We incorporate the dynamics of legged robots into the policy, where observations serve as inputs and actions as outputs of the dynamics model. By embedding the dynamics equations within the multi-layer perceptron (MLP) computation process and making the parameters trainable, we enhance interpretability. Additionally, Bayesian optimization is introduced to train these parameters. We validate the proposed fully interpretable motion control algorithm on a legged robot, opening new research avenues for motion control and learning algorithms for legged robots within the deep learning framework.

Keywords: deep reinforcement learning, interpretation, motion control, legged robots

Procedia PDF Downloads 21

7775 A Deep Reinforcement Learning-Based Secure Framework against Adversarial Attacks in Power System

Authors: Arshia Aflaki, Hadis Karimipour, Anik Islam

Abstract:

Generative Adversarial Attacks (GAAs) threaten critical sectors, ranging from fingerprint recognition to industrial control systems. Existing Deep Learning (DL) algorithms are not robust enough against this kind of cyber-attack. As one of the most critical industries in the world, the power grid is not an exception. In this study, a Deep Reinforcement Learning-based (DRL) framework assisting the DL model to improve the robustness of the model against generative adversarial attacks is proposed. Real-world smart grid stability data, as an IIoT dataset, test our method and improves the classification accuracy of a deep learning model from around 57 percent to 96 percent.

Keywords: generative adversarial attack, deep reinforcement learning, deep learning, IIoT, generative adversarial networks, power system

Procedia PDF Downloads 36

7774 Deep Reinforcement Learning Model Using Parameterised Quantum Circuits

Authors: Lokes Parvatha Kumaran S., Sakthi Jay Mahenthar C., Sathyaprakash P., Jayakumar V., Shobanadevi A.

Abstract:

With the evolution of technology, the need to solve complex computational problems like machine learning and deep learning has shot up. But even the most powerful classical supercomputers find it difficult to execute these tasks. With the recent development of quantum computing, researchers and tech-giants strive for new quantum circuits for machine learning tasks, as present works on Quantum Machine Learning (QML) ensure less memory consumption and reduced model parameters. But it is strenuous to simulate classical deep learning models on existing quantum computing platforms due to the inflexibility of deep quantum circuits. As a consequence, it is essential to design viable quantum algorithms for QML for noisy intermediate-scale quantum (NISQ) devices. The proposed work aims to explore Variational Quantum Circuits (VQC) for Deep Reinforcement Learning by remodeling the experience replay and target network into a representation of VQC. In addition, to reduce the number of model parameters, quantum information encoding schemes are used to achieve better results than the classical neural networks. VQCs are employed to approximate the deep Q-value function for decision-making and policy-selection reinforcement learning with experience replay and the target network.

Keywords: quantum computing, quantum machine learning, variational quantum circuit, deep reinforcement learning, quantum information encoding scheme

Procedia PDF Downloads 133

7773 Distributed System Computing Resource Scheduling Algorithm Based on Deep Reinforcement Learning

Authors: Yitao Lei, Xingxiang Zhai, Burra Venkata Durga Kumar

Abstract:

As the quantity and complexity of computing in large-scale software systems increase, distributed system computing becomes increasingly important. The distributed system realizes high-performance computing by collaboration between different computing resources. If there are no efficient resource scheduling resources, the abuse of distributed computing may cause resource waste and high costs. However, resource scheduling is usually an NP-hard problem, so we cannot find a general solution. However, some optimization algorithms exist like genetic algorithm, ant colony optimization, etc. The large scale of distributed systems makes this traditional optimization algorithm challenging to work with. Heuristic and machine learning algorithms are usually applied in this situation to ease the computing load. As a result, we do a review of traditional resource scheduling optimization algorithms and try to introduce a deep reinforcement learning method that utilizes the perceptual ability of neural networks and the decision-making ability of reinforcement learning. Using the machine learning method, we try to find important factors that influence the performance of distributed system computing and help the distributed system do an efficient computing resource scheduling. This paper surveys the application of deep reinforcement learning on distributed system computing resource scheduling proposes a deep reinforcement learning method that uses a recurrent neural network to optimize the resource scheduling, and proposes the challenges and improvement directions for DRL-based resource scheduling algorithms.

Keywords: resource scheduling, deep reinforcement learning, distributed system, artificial intelligence

Procedia PDF Downloads 111

7772 Reinforcement Learning for Classification of Low-Resolution Satellite Images

Authors: Khadija Bouzaachane, El Mahdi El Guarmah

Abstract:

The classification of low-resolution satellite images has been a worthwhile and fertile field that attracts plenty of researchers due to its importance in monitoring geographical areas. It could be used for several purposes such as disaster management, military surveillance, agricultural monitoring. The main objective of this work is to classify efficiently and accurately low-resolution satellite images by using novel technics of deep learning and reinforcement learning. The images include roads, residential areas, industrial areas, rivers, sea lakes, and vegetation. To achieve that goal, we carried out experiments on the sentinel-2 images considering both high accuracy and efficiency classification. Our proposed model achieved a 91% accuracy on the testing dataset besides a good classification for land cover. Focus on the parameter precision; we have obtained 93% for the river, 92% for residential, 97% for residential, 96% for the forest, 87% for annual crop, 84% for herbaceous vegetation, 85% for pasture, 78% highway and 100% for Sea Lake.

Keywords: classification, deep learning, reinforcement learning, satellite imagery

Procedia PDF Downloads 213

7771 A Comparative Study of Mechanisms across Different Online Social Learning Types

Authors: Xinyu Wang

Abstract:

In the context of the rapid development of Internet technology and the increasing prevalence of online social media, this study investigates the impact of digital communication on social learning. Through three behavioral experiments, we explore both affective and cognitive social learning in online environments. Experiment 1 manipulates the content of experimental materials and two forms of feedback, emotional valence, sociability, and repetition, to verify whether individuals can achieve online emotional social learning through reinforcement using two social learning strategies. Results reveal that both social learning strategies can assist individuals in affective, social learning through reinforcement, with feedback-based learning strategies outperforming frequency-dependent strategies. Experiment 2 similarly manipulates the content of experimental materials and two forms of feedback to verify whether individuals can achieve online knowledge social learning through reinforcement using two social learning strategies. Results show that similar to online affective social learning, individuals adopt both social learning strategies to achieve cognitive social learning through reinforcement, with feedback-based learning strategies outperforming frequency-dependent strategies. Experiment 3 simultaneously observes online affective and cognitive social learning by manipulating the content of experimental materials and feedback at different levels of social pressure. Results indicate that online affective social learning exhibits different learning effects under different levels of social pressure, whereas online cognitive social learning remains unaffected by social pressure, demonstrating more stable learning effects. Additionally, to explore the sustained effects of online social learning and differences in duration among different types of online social learning, all three experiments incorporate two test time points. Results reveal significant differences in pre-post-test scores for online social learning in Experiments 2 and 3, whereas differences are less apparent in Experiment 1. To accurately measure the sustained effects of online social learning, the researchers conducted a mini-meta-analysis of all effect sizes of online social learning duration. Results indicate that although the overall effect size is small, the effect of online social learning weakens over time.

Keywords: online social learning, affective social learning, cognitive social learning, social learning strategies, social reinforcement, social pressure, duration

Procedia PDF Downloads 46

7770 Machine Learning Approach for Mutation Testing

Authors: Michael Stewart

Abstract:

Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs. This paper used reinforcement learning and parallel processing within the context of mutation testing for the selection of mutation operators and test cases that reduced the computational cost of testing and improved test suite effectiveness. Experiments were conducted using sample programs to determine how well the reinforcement learning-based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers. With reinforcement learning, the mutation operators utilized were reduced by 50 – 100%.

Keywords: automated-testing, machine learning, mutation testing, parallel processing, reinforcement learning, software engineering, software testing

Procedia PDF Downloads 198

7769 Personalized Email Marketing Strategy: A Reinforcement Learning Approach

Authors: Lei Zhang, Tingting Xu, Jun He, Zhenyu Yan

Abstract:

Email marketing is one of the most important segments of online marketing. It has been proved to be the most effective way to acquire and retain customers. The email content is vital to customers. Different customers may have different familiarity with a product, so a successful marketing strategy must personalize email content based on individual customers’ product affinity. In this study, we build our personalized email marketing strategy with three types of emails: nurture, promotion, and conversion. Each type of email has a different influence on customers. We investigate this difference by analyzing customers’ open rates, click rates and opt-out rates. Feature importance from response models is also analyzed. The goal of the marketing strategy is to improve the click rate on conversion-type emails. To build the personalized strategy, we formulate the problem as a reinforcement learning problem and adopt a Q-learning algorithm with variations. The simulation results show that our model-based strategy outperforms the current marketer’s strategy.

Keywords: email marketing, email content, reinforcement learning, machine learning, Q-learning

Procedia PDF Downloads 194

7768 Deep Reinforcement Learning and Generative Adversarial Networks Approach to Thwart Intrusions and Adversarial Attacks

Authors: Fabrice Setephin Atedjio, Jean-Pierre Lienou, Frederica F. Nelson, Sachin S. Shetty

Abstract:

Malicious users exploit vulnerabilities in computer systems, significantly disrupting their performance and revealing the inadequacies of existing protective solutions. Even machine learning-based approaches, designed to ensure reliability, can be compromised by adversarial attacks that undermine their robustness. This paper addresses two critical aspects of enhancing model reliability. First, we focus on improving model performance and robustness against adversarial threats. To achieve this, we propose a strategy by harnessing deep reinforcement learning. Second, we introduce an approach leveraging generative adversarial networks to counter adversarial attacks effectively. Our results demonstrate substantial improvements over previous works in the literature, with classifiers exhibiting enhanced accuracy in classification tasks, even in the presence of adversarial perturbations. These findings underscore the efficacy of the proposed model in mitigating intrusions and adversarial attacks within the machine learning landscape.

Keywords: machine learning, reliability, adversarial attacks, deep-reinforcement learning, robustness

Procedia PDF Downloads 8

7767 Deep Reinforcement Learning Model for Autonomous Driving

Authors: Boumaraf Malak

Abstract:

The development of intelligent transportation systems (ITS) and artificial intelligence (AI) are spurring us to pave the way for the widespread adoption of autonomous vehicles (AVs). This is open again opportunities for smart roads, smart traffic safety, and mobility comfort. A highly intelligent decision-making system is essential for autonomous driving around dense, dynamic objects. It must be able to handle complex road geometry and topology, as well as complex multiagent interactions, and closely follow higher-level commands such as routing information. Autonomous vehicles have become a very hot research topic in recent years due to their significant ability to reduce traffic accidents and personal injuries. Using new artificial intelligence-based technologies handles important functions in scene understanding, motion planning, decision making, vehicle control, social behavior, and communication for AV. This paper focuses only on deep reinforcement learning-based methods; it does not include traditional (flat) planar techniques, which have been the subject of extensive research in the past because reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. The DRL algorithm used so far found solutions to the four main problems of autonomous driving; in our paper, we highlight the challenges and point to possible future research directions.

Keywords: deep reinforcement learning, autonomous driving, deep deterministic policy gradient, deep Q-learning

Procedia PDF Downloads 85

7766 Efficient Subgoal Discovery for Hierarchical Reinforcement Learning Using Local Computations

Authors: Adrian Millea

Abstract:

In hierarchical reinforcement learning, one of the main issues encountered is the discovery of subgoal states or options (which are policies reaching subgoal states) by partitioning the environment in a meaningful way. This partitioning usually requires an expensive global clustering operation or eigendecomposition of the Laplacian of the states graph. We propose a local solution to this issue, much more efficient than algorithms using global information, which successfully discovers subgoal states by computing a simple function, which we call heterogeneity for each state as a function of its neighbors. Moreover, we construct a value function using the difference in heterogeneity from one step to the next, as reward, such that we are able to explore the state space much more efficiently than say epsilon-greedy. The same principle can then be applied to higher level of the hierarchy, where now states are subgoals discovered at the level below.

Keywords: exploration, hierarchical reinforcement learning, locality, options, value functions

Procedia PDF Downloads 171

7765 Using Q-Learning to Auto-Tune PID Controller Gains for Online Quadcopter Altitude Stabilization

Authors: Y. Alrubyli

Abstract:

Unmanned Arial Vehicles (UAVs), and more specifically, quadcopters need to be stable during their flights. Altitude stability is usually achieved by using a PID controller that is built into the flight controller software. Furthermore, the PID controller has gains that need to be tuned to reach optimal altitude stabilization during the quadcopter’s flight. For that, control system engineers need to tune those gains by using extensive modeling of the environment, which might change from one environment and condition to another. As quadcopters penetrate more sectors, from the military to the consumer sectors, they have been put into complex and challenging environments more than ever before. Hence, intelligent self-stabilizing quadcopters are needed to maneuver through those complex environments and situations. Here we show that by using online reinforcement learning with minimal background knowledge, the altitude stability of the quadcopter can be achieved using a model-free approach. We found that by using background knowledge instead of letting the online reinforcement learning algorithm wander for a while to tune the PID gains, altitude stabilization can be achieved faster. In addition, using this approach will accelerate development by avoiding extensive simulations before applying the PID gains to the real-world quadcopter. Our results demonstrate the possibility of using the trial and error approach of reinforcement learning combined with background knowledge to achieve faster quadcopter altitude stabilization in different environments and conditions.

Keywords: reinforcement learning, Q-leanring, online learning, PID tuning, unmanned aerial vehicle, quadcopter

Procedia PDF Downloads 173

7764 Umbrella Reinforcement Learning – A Tool for Hard Problems

Authors: Egor E. Nuzhin, Nikolay V. Brilliantov

Abstract:

We propose an approach for addressing Reinforcement Learning (RL) problems. It combines the ideas of umbrella sampling, borrowed from Monte Carlo technique of computational physics and chemistry, with optimal control methods, and is realized on the base of neural networks. This results in a powerful algorithm, designed to solve hard RL problems – the problems, with long-time delayed reward, state-traps sticking and a lack of terminal states. It outperforms the prominent algorithms, such as PPO, RND, iLQR and VI, which are among the most efficient for the hard problems. The new algorithm deals with a continuous ensemble of agents and expected return, that includes the ensemble entropy. This results in a quick and efficient search of the optimal policy in terms of ”exploration-exploitation trade-off” in the state-action space.

Keywords: umbrella sampling, reinforcement learning, policy gradient, dynamic programming

Procedia PDF Downloads 21

7763 Effectiveness of Reinforcement Learning (RL) for Autonomous Energy Management Solutions

Authors: Tesfaye Mengistu

Abstract:

This thesis aims to investigate the effectiveness of Reinforcement Learning (RL) for Autonomous Energy Management solutions. The study explores the potential of Model Free RL approaches, such as Monte Carlo RL and Q-learning, to improve energy management by autonomously adjusting energy management strategies to maximize efficiency. The research investigates the implementation of RL algorithms for optimizing energy consumption in a single-agent environment. The focus is on developing a framework for the implementation of RL algorithms, highlighting the importance of RL for enabling autonomous systems to adapt quickly to changing conditions and make decisions based on previous experiences. Moreover, the paper proposes RL as a novel energy management solution to address nations' CO2 emission goals. Reinforcement learning algorithms are well-suited to solving problems with sequential decision-making patterns and can provide accurate and immediate outputs to ease the planning and decision-making process. This research provides insights into the challenges and opportunities of using RL for energy management solutions and recommends further studies to explore its full potential. In conclusion, this study provides valuable insights into how RL can be used to improve the efficiency of energy management systems and supports the use of RL as a promising approach for developing autonomous energy management solutions in residential buildings.

Keywords: artificial intelligence, reinforcement learning, monte carlo, energy management, CO2 emission

Procedia PDF Downloads 83

7762 Research on Knowledge Graph Inference Technology Based on Proximal Policy Optimization

Authors: Yihao Kuang, Bowen Ding

Abstract:

With the increasing scale and complexity of knowledge graph, modern knowledge graph contains more and more types of entity, relationship, and attribute information. Therefore, in recent years, it has been a trend for knowledge graph inference to use reinforcement learning to deal with large-scale, incomplete, and noisy knowledge graph and improve the inference effect and interpretability. The Proximal Policy Optimization (PPO) algorithm utilizes a near-end strategy optimization approach. This allows for more extensive updates of policy parameters while constraining the update extent to maintain training stability. This characteristic enables PPOs to converge to improve strategies more rapidly, often demonstrating enhanced performance early in the training process. Furthermore, PPO has the advantage of offline learning, effectively utilizing historical experience data for training and enhancing sample utilization. This means that even with limited resources, PPOs can efficiently train for reinforcement learning tasks. Based on these characteristics, this paper aims to obtain better and more efficient inference effect by introducing PPO into knowledge inference technology.

Keywords: reinforcement learning, PPO, knowledge inference, supervised learning

Procedia PDF Downloads 67

7761 Analysis of Q-Learning on Artificial Neural Networks for Robot Control Using Live Video Feed

Authors: Nihal Murali, Kunal Gupta, Surekha Bhanot

Abstract:

Training of artificial neural networks (ANNs) using reinforcement learning (RL) techniques is being widely discussed in the robot learning literature. The high model complexity of ANNs along with the model-free nature of RL algorithms provides a desirable combination for many robotics applications. There is a huge need for algorithms that generalize using raw sensory inputs, such as vision, without any hand-engineered features or domain heuristics. In this paper, the standard control problem of line following robot was used as a test-bed, and an ANN controller for the robot was trained on images from a live video feed using Q-learning. A virtual agent was first trained in simulation environment and then deployed onto a robot’s hardware. The robot successfully learns to traverse a wide range of curves and displays excellent generalization ability. Qualitative analysis of the evolution of policies, performance and weights of the network provide insights into the nature and convergence of the learning algorithm.

Keywords: artificial neural networks, q-learning, reinforcement learning, robot learning

Procedia PDF Downloads 372

7760 Gaits Stability Analysis for a Pneumatic Quadruped Robot Using Reinforcement Learning

Authors: Soofiyan Atar, Adil Shaikh, Sahil Rajpurkar, Pragnesh Bhalala, Aniket Desai, Irfan Siddavatam

Abstract:

Deep reinforcement learning (deep RL) algorithms leverage the symbolic power of complex controllers by automating it by mapping sensory inputs to low-level actions. Deep RL eliminates the complex robot dynamics with minimal engineering. Deep RL provides high-risk involvement by directly implementing it in real-world scenarios and also high sensitivity towards hyperparameters. Tuning of hyperparameters on a pneumatic quadruped robot becomes very expensive through trial-and-error learning. This paper presents an automated learning control for a pneumatic quadruped robot using sample efficient deep Q learning, enabling minimal tuning and very few trials to learn the neural network. Long training hours may degrade the pneumatic cylinder due to jerk actions originated through stochastic weights. We applied this method to the pneumatic quadruped robot, which resulted in a hopping gait. In our process, we eliminated the use of a simulator and acquired a stable gait. This approach evolves so that the resultant gait matures more sturdy towards any stochastic changes in the environment. We further show that our algorithm performed very well as compared to programmed gait using robot dynamics.

Keywords: model-based reinforcement learning, gait stability, supervised learning, pneumatic quadruped

Procedia PDF Downloads 316

7759 Cryptographic Resource Allocation Algorithm Based on Deep Reinforcement Learning

Authors: Xu Jie

Abstract:

As a key network security method, cryptographic services must fully cope with problems such as the wide variety of cryptographic algorithms, high concurrency requirements, random job crossovers, and instantaneous surges in workloads. Its complexity and dynamics also make it difficult for traditional static security policies to cope with the ever-changing situation. Cyber Threats and Environment. Traditional resource scheduling algorithms are inadequate when facing complex decision-making problems in dynamic environments. A network cryptographic resource allocation algorithm based on reinforcement learning is proposed, aiming to optimize task energy consumption, migration cost, and fitness of differentiated services (including user, data, and task security) by modeling the multi-job collaborative cryptographic service scheduling problem as a multi-objective optimized job flow scheduling problem and using a multi-agent reinforcement learning method, efficient scheduling and optimal configuration of cryptographic service resources are achieved. By introducing reinforcement learning, resource allocation strategies can be adjusted in real-time in a dynamic environment, improving resource utilization and achieving load balancing. Experimental results show that this algorithm has significant advantages in path planning length, system delay and network load balancing and effectively solves the problem of complex resource scheduling in cryptographic services.

Keywords: cloud computing, cryptography on-demand service, reinforcement learning, workflow scheduling

Procedia PDF Downloads 12

7758 Adhesion Performance According to Lateral Reinforcement Method of Textile

Authors: Jungbhin You, Taekyun Kim, Jongho Park, Sungnam Hong, Sun-Kyu Park

Abstract:

Reinforced concrete has been mainly used in construction field because of excellent durability. However, it may lead to reduction of durability and safety due to corrosion of reinforcement steels according to damage of concrete surface. Recently, research of textile is ongoing to complement weakness of reinforced concrete. In previous research, only experiment of longitudinal length were performed. Therefore, in order to investigate the adhesion performance according to the lattice shape and the embedded length, the pull-out test was performed on the roving with parameter of the number of lateral reinforcement, the lateral reinforcement length and the lateral reinforcement spacing. As a result, the number of lateral reinforcement and the lateral reinforcement length did not significantly affect the load variation depending on the adhesion performance, and only the load analysis results according to the reinforcement spacing are affected.

Keywords: adhesion performance, lateral reinforcement, pull-out test, textile

Procedia PDF Downloads 358

7757 Robot Movement Using the Trust Region Policy Optimization

Authors: Romisaa Ali

Abstract:

The Policy Gradient approach is one of the deep reinforcement learning families that combines deep neural networks (DNN) with reinforcement learning RL to discover the optimum of the control problem through experience gained from the interaction between the robot and its surroundings. In contrast to earlier policy gradient algorithms, which were unable to handle these two types of error because of over-or under-estimation introduced by the deep neural network model, this article will discuss the state-of-the-art SOTA policy gradient technique, trust region policy optimization (TRPO), by applying this method in various environments compared to another policy gradient method, the Proximal Policy Optimization (PPO), to explain their robust optimization, using this SOTA to gather experience data during various training phases after observing the impact of hyper-parameters on neural network performance.

Keywords: deep neural networks, deep reinforcement learning, proximal policy optimization, state-of-the-art, trust region policy optimization

Procedia PDF Downloads 169

7756 High-Frequency Cryptocurrency Portfolio Management Using Multi-Agent System Based on Federated Reinforcement Learning

Authors: Sirapop Nuannimnoi, Hojjat Baghban, Ching-Yao Huang

Abstract:

Over the past decade, with the fast development of blockchain technology since the birth of Bitcoin, there has been a massive increase in the usage of Cryptocurrencies. Cryptocurrencies are not seen as an investment opportunity due to the market’s erratic behavior and high price volatility. With the recent success of deep reinforcement learning (DRL), portfolio management can be modeled and automated. In this paper, we propose a novel DRL-based multi-agent system to automatically make proper trading decisions on multiple cryptocurrencies and gain profits in the highly volatile cryptocurrency market. We also extend this multi-agent system with horizontal federated transfer learning for better adapting to the inclusion of new cryptocurrencies in our portfolio; therefore, we can, through the concept of diversification, maximize our profits and minimize the trading risks. Experimental results through multiple simulation scenarios reveal that this proposed algorithmic trading system can offer three promising key advantages over other systems, including maximized profits, minimized risks, and adaptability.

Keywords: cryptocurrency portfolio management, algorithmic trading, federated learning, multi-agent reinforcement learning

Procedia PDF Downloads 119

7755 Research on Knowledge Graph Inference Technology Based on Proximal Policy Optimization

Authors: Yihao Kuang, Bowen Ding

Abstract:

With the increasing scale and complexity of knowledge graph, modern knowledge graph contains more and more types of entity, relationship, and attribute information. Therefore, in recent years, it has been a trend for knowledge graph inference to use reinforcement learning to deal with large-scale, incomplete, and noisy knowledge graphs and improve the inference effect and interpretability. The Proximal Policy Optimization (PPO) algorithm utilizes a near-end strategy optimization approach. This allows for more extensive updates of policy parameters while constraining the update extent to maintain training stability. This characteristic enables PPOs to converge to improved strategies more rapidly, often demonstrating enhanced performance early in the training process. Furthermore, PPO has the advantage of offline learning, effectively utilizing historical experience data for training and enhancing sample utilization. This means that even with limited resources, PPOs can efficiently train for reinforcement learning tasks. Based on these characteristics, this paper aims to obtain a better and more efficient inference effect by introducing PPO into knowledge inference technology.

Keywords: reinforcement learning, PPO, knowledge inference

Procedia PDF Downloads 242

7754 Using Personalized Spiking Neural Networks, Distinct Techniques for Self-Governing

Authors: Brwa Abdulrahman Abubaker

Abstract:

Recently, there has been a lot of interest in the difficult task of applying reinforcement learning to autonomous mobile robots. Conventional reinforcement learning (TRL) techniques have many drawbacks, such as lengthy computation times, intricate control frameworks, a great deal of trial and error searching, and sluggish convergence. In this paper, a modified Spiking Neural Network (SNN) is used to offer a distinct method for autonomous mobile robot learning and control in unexpected surroundings. As a learning algorithm, the suggested model combines dopamine modulation with spike-timing-dependent plasticity (STDP). In order to create more computationally efficient, biologically inspired control systems that are adaptable to changing settings, this work uses the effective and physiologically credible Izhikevich neuron model. This study is primarily focused on creating an algorithm for target tracking in the presence of obstacles. Results show that the SNN trained with three obstacles yielded an impressive 96% success rate for our proposal, with collisions happening in about 4% of the 214 simulated seconds.

Keywords: spiking neural network, spike-timing-dependent plasticity, dopamine modulation, reinforcement learning

Procedia PDF Downloads 21