Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 7380

Search results for: reinforcement learning

7350 ROOP: Translating Sequential Code Fragments to Distributed Code Fragments Using Deep Reinforcement Learning

Abstract:

Every second, massive amounts of data are generated, and Data Intensive Scalable Computing (DISC) frameworks have evolved into effective tools for analyzing such massive amounts of data. Since the underlying architecture of these distributed computing platforms is often new to users, building a DISC application can often be time-consuming and prone to errors. The automated conversion of a sequential program to a DISC program will consequently significantly improve productivity. However, synthesizing a user’s intended program from an input specification is complex, with several important applications, such as distributed program synthesizing and code refactoring. Existing works such as Tyro and Casper rely entirely on deductive synthesis techniques or similar program synthesis approaches. Our approach is to develop a data-driven synthesis technique to identify sequential components and translate them to equivalent distributed operations. We emphasize using reinforcement learning and unit testing as feedback mechanisms to achieve our objectives.

Keywords: program synthesis, distributed computing, reinforcement learning, unit testing, DISC

Procedia PDF Downloads 70

7349 Mutiple Medical Landmark Detection on X-Ray Scan Using Reinforcement Learning

Authors: Vijaya Yuvaram Singh V M, Kameshwar Rao J V

Abstract:

The challenge with development of neural network based methods for medical is the availability of data. Anatomical landmark detection in the medical domain is a process to find points on the x-ray scan report of the patient. Most of the time this task is done manually by trained professionals as it requires precision and domain knowledge. Traditionally object detection based methods are used for landmark detection. Here, we utilize reinforcement learning and query based method to train a single agent capable of detecting multiple landmarks. A deep Q network agent is trained to detect single and multiple landmarks present on hip and shoulder from x-ray scan of a patient. Here a single agent is trained to find multiple landmark making it superior to having individual agents per landmark. For the initial study, five images of different patients are used as the environment and tested the agents performance on two unseen images.

Keywords: reinforcement learning, medical landmark detection, multi target detection, deep neural network

Procedia PDF Downloads 115

7348 The Effect of Geogrid Reinforcement Pre-Stressing on the Performance of Sand Bed Supporting a Strip Foundation

Authors: Ahmed M. Eltohamy

Abstract:

In this paper, an experimental and numerical study was adopted to investigate the effect geogrid soil reinforcement pre-stressing on the pressure settlement relation of sand bed supporting a strip foundation. The studied parameters include foundation depth and pre-stress ratio for the cases of one and two pre-stressed reinforcement layers. The study reflected that pre-stressing of soil reinforcement resulted in a marked enhancement in reinforced bed soil stiffness compared to the reinforced soil without pre-stress. The best benefit of pre-stressing reinforcement was obtained as the overburden pressure and pre-straining ratio increase. Pre-stressing of double reinforcement topmost layers results in further enhancement of stress strain relation of bed soil.

Keywords: geogrid reinforcement, prestress, strip footing, bearing capacity

Procedia PDF Downloads 269

7347 Examination of the Reinforcement Forces Generated in Pseudo-Static and Dynamic Status in Retaining Walls

Authors: K. Passbakhsh

Abstract:

Determination of reinforcement forces is one of the most important and main discussions in designing retaining walls. By determining these forces we refrain from conservative planning. By numerically modeling the reinforced soil retaining walls under dynamic loading reinforcement forces can be calculated. In this study we try to approach the gained forces by pseudo-static method according to FHWA code and gained forces from numerical modeling by finite element method, by selecting seismic horizontal coefficient for different wall height. PLAXIS software was used for numerical analysis. Then the effect of reinforcement stiffness and soil type on reinforcement forces is examined.

Keywords: reinforced soil, PLAXIS, reinforcement forces, retaining walls

Procedia PDF Downloads 328

7346 Reinforcement Learning Optimization: Unraveling Trends and Advancements in Metaheuristic Algorithms

Authors: Rahul Paul, Kedar Nath Das

Abstract:

The field of machine learning (ML) is experiencing rapid development, resulting in a multitude of theoretical advancements and extensive practical implementations across various disciplines. The objective of ML is to facilitate the ability of machines to perform cognitive tasks by leveraging knowledge gained from prior experiences and effectively addressing complex problems, even in situations that deviate from previously encountered instances. Reinforcement Learning (RL) has emerged as a prominent subfield within ML and has gained considerable attention in recent times from researchers. This surge in interest can be attributed to the practical applications of RL, the increasing availability of data, and the rapid advancements in computing power. At the same time, optimization algorithms play a pivotal role in the field of ML and have attracted considerable interest from researchers. A multitude of proposals have been put forth to address optimization problems or improve optimization techniques within the domain of ML. The necessity of a thorough examination and implementation of optimization algorithms within the context of ML is of utmost importance in order to provide guidance for the advancement of research in both optimization and ML. This article provides a comprehensive overview of the application of metaheuristic evolutionary optimization algorithms in conjunction with RL to address a diverse range of scientific challenges. Furthermore, this article delves into the various challenges and unresolved issues pertaining to the optimization of RL models.

Keywords: machine learning, reinforcement learning, loss function, evolutionary optimization techniques

Procedia PDF Downloads 45

7345 LanE-change Path Planning of Autonomous Driving Using Model-Based Optimization, Deep Reinforcement Learning and 5G Vehicle-to-Vehicle Communications

Authors: William Li

Abstract:

Lane-change path planning is a crucial and yet complex task in autonomous driving. The traditional path planning approach based on a system of carefully-crafted rules to cover various driving scenarios becomes unwieldy as more and more rules are added to deal with exceptions and corner cases. This paper proposes to divide the entire path planning to two stages. In the first stage the ego vehicle travels longitudinally in the source lane to reach a safe state. In the second stage the ego vehicle makes lateral lane-change maneuver to the target lane. The paper derives the safe state conditions based on lateral lane-change maneuver calculation to ensure collision free in the second stage. To determine the acceleration sequence that minimizes the time to reach a safe state in the first stage, the paper proposes three schemes, namely, kinetic model based optimization, deep reinforcement learning, and 5G vehicle-to-vehicle (V2V) communications. The paper investigates these schemes via simulation. The model-based optimization is sensitive to the model assumptions. The deep reinforcement learning is more flexible in handling scenarios beyond the model assumed by the optimization. The 5G V2V eliminates uncertainty in predicting future behaviors of surrounding vehicles by sharing driving intents and enabling cooperative driving.

Keywords: lane change, path planning, autonomous driving, deep reinforcement learning, 5G, V2V communications, connected vehicles

Procedia PDF Downloads 168

7344 Reinforcement Learning for Robust Missile Autopilot Design: TRPO Enhanced by Schedule Experience Replay

Authors: Bernardo Cortez, Florian Peter, Thomas Lausenhammer, Paulo Oliveira

Abstract:

Designing missiles’ autopilot controllers have been a complex task, given the extensive flight envelope and the nonlinear flight dynamics. A solution that can excel both in nominal performance and in robustness to uncertainties is still to be found. While Control Theory often debouches into parameters’ scheduling procedures, Reinforcement Learning has presented interesting results in ever more complex tasks, going from videogames to robotic tasks with continuous action domains. However, it still lacks clearer insights on how to find adequate reward functions and exploration strategies. To the best of our knowledge, this work is a pioneer in proposing Reinforcement Learning as a framework for flight control. In fact, it aims at training a model-free agent that can control the longitudinal non-linear flight dynamics of a missile, achieving the target performance and robustness to uncertainties. To that end, under TRPO’s methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance. Not only does this work enhance the concept of prioritized experience replay into BPER, but it also reformulates HER, activating them both only when the training progress converges to suboptimal policies, in what is proposed as the SER methodology. The results show that it is possible both to achieve the target performance and to improve the agent’s robustness to uncertainties (with low damage on nominal performance) by further training it in non-nominal environments, therefore validating the proposed approach and encouraging future research in this field.

Keywords: Reinforcement Learning, flight control, HER, missile autopilot, TRPO

Procedia PDF Downloads 223

7343 Integrating Distributed Architectures in Highly Modular Reinforcement Learning Libraries

Authors: Albert Bou, Sebastian Dittert, Gianni de Fabritiis

Abstract:

Advancing reinforcement learning (RL) requires tools that are flexible enough to easily prototype new methods while avoiding impractically slow experimental turnaround times. To match the first requirement, the most popular RL libraries advocate for highly modular agent composability, which facilitates experimentation and development. To solve challenging environments within reasonable time frames, scaling RL to large sampling and computing resources has proved a successful strategy. However, this capability has been so far difficult to combine with modularity. In this work, we explore design choices to allow agent composability both at a local and distributed level of execution. We propose a versatile approach that allows the definition of RL agents at different scales through independent, reusable components. We demonstrate experimentally that our design choices allow us to reproduce classical benchmarks, explore multiple distributed architectures, and solve novel and complex environments while giving full control to the user in the agent definition and training scheme definition. We believe this work can provide useful insights to the next generation of RL libraries.

Keywords: deep reinforcement learning, Python, PyTorch, distributed training, modularity, library

Procedia PDF Downloads 53

7342 Trajectory Design and Power Allocation for Energy -Efficient UAV Communication Based on Deep Reinforcement Learning

Authors: Yuling Cui, Danhao Deng, Chaowei Wang, Weidong Wang

Abstract:

In recent years, unmanned aerial vehicles (UAVs) have been widely used in wireless communication, attracting more and more attention from researchers. UAVs can not only serve as a relay for auxiliary communication but also serve as an aerial base station for ground users (GUs). However, limited energy means that they cannot work all the time and cover a limited range of services. In this paper, we investigate 2D UAV trajectory design and power allocation in order to maximize the UAV's service time and downlink throughput. Based on deep reinforcement learning, we propose a depth deterministic strategy gradient algorithm for trajectory design and power distribution (TDPA-DDPG) to solve the energy-efficient and communication service quality problem. The simulation results show that TDPA-DDPG can extend the service time of UAV as much as possible, improve the communication service quality, and realize the maximization of downlink throughput, which is significantly improved compared with existing methods.

Keywords: UAV trajectory design, power allocation, energy efficient, downlink throughput, deep reinforcement learning, DDPG

Procedia PDF Downloads 106

7341 Conscious Intention-based Processes Impact the Neural Activities Prior to Voluntary Action on Reinforcement Learning Schedules

Authors: Xiaosheng Chen, Jingjing Chen, Phil Reed, Dan Zhang

Abstract:

Conscious intention can be a promising point cut to grasp consciousness and orient voluntary action. The current study adopted a random ratio (RR), yoked random interval (RI) reinforcement learning schedule instead of the previous highly repeatable and single decision point paradigms, aimed to induce voluntary action with the conscious intention that evolves from the interaction between short-range-intention and long-range-intention. Readiness potential (RP) -like-EEG amplitude and inter-trial-EEG variability decreased significantly prior to voluntary action compared to cued action for inter-trial-EEG variability, mainly featured during the earlier stage of neural activities. Notably, (RP) -like-EEG amplitudes decreased significantly prior to higher RI-reward rates responses in which participants formed a higher plane of conscious intention. The present study suggests the possible contribution of conscious intention-based processes to the neural activities from the earlier stage prior to voluntary action on reinforcement leanring schedule.

Keywords: Reinforcement leaning schedule, voluntary action, EEG, conscious intention, readiness potential

Procedia PDF Downloads 46

7340 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 95

7339 Nonlinear Finite Element Modeling of Unbonded Steel Reinforced Concrete Beams

Authors: Fares Jnaid, Riyad Aboutaha

Abstract:

In this paper, a nonlinear Finite Element Analysis (FEA) was carried out using ANSYS software to build a model able of predicting the behavior of Reinforced Concrete (RC) beams with unbonded reinforcement. The FEA model was compared to existing experimental data by other researchers. The existing experimental data consisted of 16 beams that varied from structurally sound beams to beams with unbonded reinforcement with different unbonded lengths and reinforcement ratios. The model was able to predict the ultimate flexural strength, load-deflection curve, and crack pattern of concrete beams with unbonded reinforcement. It was concluded that when the when the unbonded length is less than 45% of the span, there will be no decrease in the ultimate flexural strength due to the loss of bond between the steel reinforcement and the surrounding concrete regardless of the reinforcement ratio. Moreover, when the reinforcement ratio is relatively low, there will be no decrease in ultimate flexural strength regardless of the length of unbond.

Keywords: FEA, ANSYS, unbond, strain

Procedia PDF Downloads 221

7338 Screening of Commonly Used Reinforcement Materials for Tomb Murals

Authors: Liping Qiu, Xiaofeng Zhang

Abstract:

In its long history, precious tomb murals suffered from various diseases due to natural and man-made destruction. The key to how to protect tomb murals is how to strengthen and protect the tomb murals. In order to maximize the life of the tomb murals, the artistic, historic, and scientific values of the tomb murals can be continued. In this paper, four kinds of traditional reinforcement materials (silicone acrylic lotion, pure acrylic lotion, polyvinyl acetate lotion, and B72) are selected to reinforce the ground support layer of tomb murals, and the reinforcement effect of each reinforcement material on the ground support layer of murals is compared and analyzed, and the best protection material is obtained.

Keywords: mural, destruction cycle, reinforcement material, disease

Procedia PDF Downloads 90

7337 Comparative Study of Deep Reinforcement Learning Algorithm Against Evolutionary Algorithms for Finding the Optimal Values in a Simulated Environment Space

Authors: Akshay Paranjape, Nils Plettenberg, Robert Schmitt

Abstract:

Traditional optimization methods like evolutionary algorithms are widely used in production processes to find an optimal or near-optimal solution of control parameters based on the simulated environment space of a process. These algorithms are computationally intensive and therefore do not provide the opportunity for real-time optimization. This paper utilizes the Deep Reinforcement Learning (DRL) framework to find an optimal or near-optimal solution for control parameters. A model based on maximum a posteriori policy optimization (Hybrid-MPO) that can handle both numerical and categorical parameters is used as a benchmark for comparison. A comparative study shows that DRL can find optimal solutions of similar quality as compared to evolutionary algorithms while requiring significantly less time making them preferable for real-time optimization. The results are confirmed in a large-scale validation study on datasets from production and other fields. A trained XGBoost model is used as a surrogate for process simulation. Finally, multiple ways to improve the model are discussed.

Keywords: reinforcement learning, evolutionary algorithms, production process optimization, real-time optimization, hybrid-MPO

Procedia PDF Downloads 74

7336 Safe and Efficient Deep Reinforcement Learning Control Model: A Hydroponics Case Study

Authors: Almutasim Billa A. Alanazi, Hal S. Tharp

Abstract:

Safe performance and efficient energy consumption are essential factors for designing a control system. This paper presents a reinforcement learning (RL) model that can be applied to control applications to improve safety and reduce energy consumption. As hardware constraints and environmental disturbances are imprecise and unpredictable, conventional control methods may not always be effective in optimizing control designs. However, RL has demonstrated its value in several artificial intelligence (AI) applications, especially in the field of control systems. The proposed model intelligently monitors a system's success by observing the rewards from the environment, with positive rewards counting as a success when the controlled reference is within the desired operating zone. Thus, the model can determine whether the system is safe to continue operating based on the designer/user specifications, which can be adjusted as needed. Additionally, the controller keeps track of energy consumption to improve energy efficiency by enabling the idle mode when the controlled reference is within the desired operating zone, thus reducing the system energy consumption during the controlling operation. Water temperature control for a hydroponic system is taken as a case study for the RL model, adjusting the variance of disturbances to show the model’s robustness and efficiency. On average, the model showed safety improvement by up to 15% and energy efficiency improvements by 35%- 40% compared to a traditional RL model.

Keywords: control system, hydroponics, machine learning, reinforcement learning

Procedia PDF Downloads 124

7335 Obstacle Avoidance Using Image-Based Visual Servoing Based on Deep Reinforcement Learning

Authors: Tong He, Long Chen, Irag Mantegh, Wen-Fang Xie

Abstract:

This paper proposes an image-based obstacle avoidance and tracking target identification strategy in GPS-degraded or GPS-denied environment for an Unmanned Aerial Vehicle (UAV). The traditional force algorithm for obstacle avoidance could produce local minima area, in which UAV cannot get away obstacle effectively. In order to eliminate it, an artificial potential approach based on harmonic potential is proposed to guide the UAV to avoid the obstacle by using the vision system. And image-based visual servoing scheme (IBVS) has been adopted to implement the proposed obstacle avoidance approach. In IBVS, the pixel accuracy is a key factor to realize the obstacle avoidance. In this paper, the deep reinforcement learning framework has been applied by reducing pixel errors through constant interaction between the environment and the agent. In addition, the combination of OpenTLD and Tensorflow based on neural network is used to identify the type of tracking target. Numerical simulation in Matlab and ROS GAZEBO show the satisfactory result in target identification and obstacle avoidance.

Keywords: image-based visual servoing, obstacle avoidance, tracking target identification, deep reinforcement learning, artificial potential approach, neural network

Procedia PDF Downloads 107

7334 A Reinforcement Learning Based Method for Heating, Ventilation, and Air Conditioning Demand Response Optimization Considering Few-Shot Personalized Thermal Comfort

Authors: Xiaohua Zou, Yongxin Su

Abstract:

The reasonable operation of heating, ventilation, and air conditioning (HVAC) is of great significance in improving the security, stability, and economy of power system operation. However, the uncertainty of the operating environment, thermal comfort varies by users and rapid decision-making pose challenges for HVAC demand response optimization. In this regard, this paper proposes a reinforcement learning-based method for HVAC demand response optimization considering few-shot personalized thermal comfort (PTC). First, an HVAC DR optimization framework based on few-shot PTC model and DRL is designed, in which the output of few-shot PTC model is regarded as the input of DRL. Then, a few-shot PTC model that distinguishes between awake and asleep states is established, which has excellent engineering usability. Next, based on soft actor criticism, an HVAC DR optimization algorithm considering the user’s PTC is designed to deal with uncertainty and make decisions rapidly. Experiment results show that the proposed method can efficiently obtain use’s PTC temperature, reduce energy cost while ensuring user’s PTC, and achieve rapid decision-making under uncertainty.

Keywords: HVAC, few-shot personalized thermal comfort, deep reinforcement learning, demand response

Procedia PDF Downloads 32

7333 Preventing the Drought of Lakes by Using Deep Reinforcement Learning in France

Authors: Farzaneh Sarbandi Farahani

Abstract:

Drought and decrease in the level of lakes in recent years due to global warming and excessive use of water resources feeding lakes are of great importance, and this research has provided a structure to investigate this issue. First, the information required for simulating lake drought is provided with strong references and necessary assumptions. Entity-Component-System (ECS) structure has been used for simulation, which can consider assumptions flexibly in simulation. Three major users (i.e., Industry, agriculture, and Domestic users) consume water from groundwater and surface water (i.e., streams, rivers and lakes). Lake Mead has been considered for simulation, and the information necessary to investigate its drought has also been provided. The results are presented in the form of a scenario-based design and optimal strategy selection. For optimal strategy selection, a deep reinforcement algorithm is developed to select the best set of strategies among all possible projects. These results can provide a better view of how to plan to prevent lake drought.

Keywords: drought simulation, Mead lake, entity component system programming, deep reinforcement learning

Procedia PDF Downloads 59

7332 Adaption of the Design Thinking Method for Production Planning in the Meat Industry Using Machine Learning Algorithms

Authors: Alica Höpken, Hergen Pargmann

Abstract:

The resource-efficient planning of the complex production planning processes in the meat industry and the reduction of food waste is a permanent challenge. The complexity of the production planning process occurs in every part of the supply chain, from agriculture to the end consumer. It arises from long and uncertain planning phases. Uncertainties such as stochastic yields, fluctuations in demand, and resource variability are part of this process. In the meat industry, waste mainly relates to incorrect storage, technical causes in production, or overproduction. The high amount of food waste along the complex supply chain in the meat industry could not be reduced by simple solutions until now. Therefore, resource-efficient production planning by conventional methods is currently only partially feasible. The realization of intelligent, automated production planning is basically possible through the application of machine learning algorithms, such as those of reinforcement learning. By applying the adapted design thinking method, machine learning methods (especially reinforcement learning algorithms) are used for the complex production planning process in the meat industry. This method represents a concretization to the application area. A resource-efficient production planning process is made available by adapting the design thinking method. In addition, the complex processes can be planned efficiently by using this method, since this standardized approach offers new possibilities in order to challenge the complexity and the high time consumption. It represents a tool to support the efficient production planning in the meat industry. This paper shows an elegant adaption of the design thinking method to apply the reinforcement learning method for a resource-efficient production planning process in the meat industry. Following, the steps that are necessary to introduce machine learning algorithms into the production planning of the food industry are determined. This is achieved based on a case study which is part of the research project ”REIF - Resource Efficient, Economic and Intelligent Food Chain” supported by the German Federal Ministry for Economic Affairs and Climate Action of Germany and the German Aerospace Center. Through this structured approach, significantly better planning results are achieved, which would be too complex or very time consuming using conventional methods.

Keywords: change management, design thinking method, machine learning, meat industry, reinforcement learning, resource-efficient production planning

Procedia PDF Downloads 99

7331 A Framework of Dynamic Rule Selection Method for Dynamic Flexible Job Shop Problem by Reinforcement Learning Method

Authors: Rui Wu

Abstract:

In the volatile modern manufacturing environment, new orders randomly occur at any time, while the pre-emptive methods are infeasible. This leads to a real-time scheduling method that can produce a reasonably good schedule quickly. The dynamic Flexible Job Shop problem is an NP-hard scheduling problem that hybrid the dynamic Job Shop problem with the Parallel Machine problem. A Flexible Job Shop contains different work centres. Each work centre contains parallel machines that can process certain operations. Many algorithms, such as genetic algorithms or simulated annealing, have been proposed to solve the static Flexible Job Shop problems. However, the time efficiency of these methods is low, and these methods are not feasible in a dynamic scheduling problem. Therefore, a dynamic rule selection scheduling system based on the reinforcement learning method is proposed in this research, in which the dynamic Flexible Job Shop problem is divided into several parallel machine problems to decrease the complexity of the dynamic Flexible Job Shop problem. Firstly, the features of jobs, machines, work centres, and flexible job shops are selected to describe the status of the dynamic Flexible Job Shop problem at each decision point in each work centre. Secondly, a framework of reinforcement learning algorithm using a double-layer deep Q-learning network is applied to select proper composite dispatching rules based on the status of each work centre. Then, based on the selected composite dispatching rule, an available operation is selected from the waiting buffer and assigned to an available machine in each work centre. Finally, the proposed algorithm will be compared with well-known dispatching rules on objectives of mean tardiness, mean flow time, mean waiting time, or mean percentage of waiting time in the real-time Flexible Job Shop problem. The result of the simulations proved that the proposed framework has reasonable performance and time efficiency.

Keywords: dynamic scheduling problem, flexible job shop, dispatching rules, deep reinforcement learning

Procedia PDF Downloads 71

7330 Comparative Analysis of Reinforcement Learning Algorithms for Autonomous Driving

Authors: Migena Mana, Ahmed Khalid Syed, Abdul Malik, Nikhil Cherian

Abstract:

In recent years, advancements in deep learning enabled researchers to tackle the problem of self-driving cars. Car companies use huge datasets to train their deep learning models to make autonomous cars a reality. However, this approach has certain drawbacks in that the state space of possible actions for a car is so huge that there cannot be a dataset for every possible road scenario. To overcome this problem, the concept of reinforcement learning (RL) is being investigated in this research. Since the problem of autonomous driving can be modeled in a simulation, it lends itself naturally to the domain of reinforcement learning. The advantage of this approach is that we can model different and complex road scenarios in a simulation without having to deploy in the real world. The autonomous agent can learn to drive by finding the optimal policy. This learned model can then be easily deployed in a real-world setting. In this project, we focus on three RL algorithms: Q-learning, Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). To model the environment, we have used TORCS (The Open Racing Car Simulator), which provides us with a strong foundation to test our model. The inputs to the algorithms are the sensor data provided by the simulator such as velocity, distance from side pavement, etc. The outcome of this research project is a comparative analysis of these algorithms. Based on the comparison, the PPO algorithm gives the best results. When using PPO algorithm, the reward is greater, and the acceleration, steering angle and braking are more stable compared to the other algorithms, which means that the agent learns to drive in a better and more efficient way in this case. Additionally, we have come up with a dataset taken from the training of the agent with DDPG and PPO algorithms. It contains all the steps of the agent during one full training in the form: (all input values, acceleration, steering angle, break, loss, reward). This study can serve as a base for further complex road scenarios. Furthermore, it can be enlarged in the field of computer vision, using the images to find the best policy.

Keywords: autonomous driving, DDPG (deep deterministic policy gradient), PPO (proximal policy optimization), reinforcement learning

Procedia PDF Downloads 118

7329 Off-Policy Q-learning Technique for Intrusion Response in Network Security

Authors: Zheni S. Stefanova, Kandethody M. Ramachandran

Abstract:

With the increasing dependency on our computer devices, we face the necessity of adequate, efficient and effective mechanisms, for protecting our network. There are two main problems that Intrusion Detection Systems (IDS) attempt to solve. 1) To detect the attack, by analyzing the incoming traffic and inspect the network (intrusion detection). 2) To produce a prompt response when the attack occurs (intrusion prevention). It is critical creating an Intrusion detection model that will detect a breach in the system on time and also challenging making it provide an automatic and with an acceptable delay response at every single stage of the monitoring process. We cannot afford to adopt security measures with a high exploiting computational power, and we are not able to accept a mechanism that will react with a delay. In this paper, we will propose an intrusion response mechanism that is based on artificial intelligence, and more precisely, reinforcement learning techniques (RLT). The RLT will help us to create a decision agent, who will control the process of interacting with the undetermined environment. The goal is to find an optimal policy, which will represent the intrusion response, therefore, to solve the Reinforcement learning problem, using a Q-learning approach. Our agent will produce an optimal immediate response, in the process of evaluating the network traffic.This Q-learning approach will establish the balance between exploration and exploitation and provide a unique, self-learning and strategic artificial intelligence response mechanism for IDS.

Keywords: cyber security, intrusion prevention, optimal policy, Q-learning

Procedia PDF Downloads 201

7328 Development of AA2024 Matrix Composites Reinforced with Micro Yttrium through Cold Compaction with Superior Mechanical Properties

Authors: C. H. S. Vidyasagar, D. B. Karunakar

Abstract:

In this present work, five different composite samples with AA2024 as matrix and varying amounts of yttrium (0.1-0.5 wt.%) as reinforcement are developed through cold compaction. The microstructures of the developed composite samples revealed that the yttrium reinforcement caused grain refinement up to 0.3 wt.% and beyond which the refinement is not effective. The microstructure revealed Al2Cu precipitation which strengthened the composite up to 0.3 wt.% yttrium reinforcement. Upon further increase in yttrium reinforcement, the intermetallics and the precipitation coarsen and their corresponding strengthening effect decreases. The mechanical characterization revealed that the composite sample reinforced with 0.3 wt.% yttrium showed highest mechanical properties like 82 HV of hardness, 276 MPa Ultimate Tensile Strength (UTS), 229 MPa Yield Strength (YS) and an elongation (EL) of 18.9% respectively. However, the relative density of the developed composites decreased with the increase in yttrium reinforcement.

Keywords: mechanical properties, AA 2024 matrix, yttrium reinforcement, cold compaction, precipitation

Procedia PDF Downloads 115

7327 Optimal Dynamic Regime for CO Oxidation Reaction Discovered by Policy-Gradient Reinforcement Learning Algorithm

Authors: Lifar M. S., Tereshchenko A. A., Bulgakov A. N., Guda S. A., Guda A. A., Soldatov A. V.

Abstract:

Metal nanoparticles are widely used as heterogeneous catalysts to activate adsorbed molecules and reduce the energy barrier of the reaction. Reaction product yield depends on the interplay between elementary processes - adsorption, activation, reaction, and desorption. These processes, in turn, depend on the inlet feed concentrations, temperature, and pressure. At stationary conditions, the active surface sites may be poisoned by reaction byproducts or blocked by thermodynamically adsorbed gaseous reagents. Thus, the yield of reaction products can significantly drop. On the contrary, the dynamic control accounts for the changes in the surface properties and adjusts reaction parameters accordingly. Therefore dynamic control may be more efficient than stationary control. In this work, a reinforcement learning algorithm has been applied to control the simulation of CO oxidation on a catalyst. The policy gradient algorithm is learned to maximize the CO₂ production rate based on the CO and O₂ flows at a given time step. Nonstationary solutions were found for the regime with surface deactivation. The maximal product yield was achieved for periodic variations of the gas flows, ensuring a balance between available adsorption sites and the concentration of activated intermediates. This methodology opens a perspective for the optimization of catalytic reactions under nonstationary conditions.

Keywords: artificial intelligence, catalyst, co oxidation, reinforcement learning, dynamic control

Procedia PDF Downloads 80

7326 Deep Reinforcement Learning for Advanced Pressure Management in Water Distribution Networks

Authors: Ahmed Negm, George Aggidis, Xiandong Ma

Abstract:

With the diverse nature of urban cities, customer demand patterns, landscape topologies or even seasonal weather trends; managing our water distribution networks (WDNs) has proved a complex task. These unpredictable circumstances manifest as pipe failures, intermittent supply and burst events thus adding to water loss, energy waste and increased carbon emissions. Whilst these events are unavoidable, advanced pressure management has proved an effective tool to control and mitigate them. Henceforth, water utilities have struggled with developing a real-time control method that is resilient when confronting the challenges of water distribution. In this paper we use deep reinforcement learning (DRL) algorithms as a novel pressure control strategy to minimise pressure violations and leakage under both burst and background leakage conditions. Agents based on asynchronous actor critic (A2C) and recurrent proximal policy optimisation (Recurrent PPO) were trained and compared to benchmarked optimisation algorithms (differential evolution, particle swarm optimisation. A2C manages to minimise leakage by 32.48% under burst conditions and 67.17% under background conditions which was the highest performance in the DRL algorithms. A2C and Recurrent PPO performed well in comparison to the benchmarks with higher processing speed and lower computational effort.

Keywords: deep reinforcement learning, pressure management, water distribution networks, leakage management

Procedia PDF Downloads 37

7325 Reinforcement Learning the Born Rule from Photon Detection

Authors: Rodrigo S. Piera, Jailson Sales Ara´ujo, Gabriela B. Lemos, Matthew B. Weiss, John B. DeBrota, Gabriel H. Aguilar, Jacques L. Pienaar

Abstract:

The Born rule was historically viewed as an independent axiom of quantum mechanics until Gleason derived it in 1957 by assuming the Hilbert space structure of quantum measurements [1]. In subsequent decades there have been diverse proposals to derive the Born rule starting from even more basic assumptions [2]. In this work, we demonstrate that a simple reinforcement-learning algorithm, having no pre-programmed assumptions about quantum theory, will nevertheless converge to a behaviour pattern that accords with the Born rule, when tasked with predicting the output of a quantum optical implementation of a symmetric informationally-complete measurement (SIC). Our findings support a hypothesis due to QBism (the subjective Bayesian approach to quantum theory), which states that the Born rule can be thought of as a normative rule for making decisions in a quantum world [3].

Keywords: quantum Bayesianism, quantum theory, quantum information, quantum measurement

Procedia PDF Downloads 48

7324 Deep Reinforcement Learning Approach for Trading Automation in The Stock Market

Authors: Taylan Kabbani, Ekrem Duman

Abstract:

The design of adaptive systems that take advantage of financial markets while reducing the risk can bring more stagnant wealth into the global market. However, most efforts made to generate successful deals in trading financial assets rely on Supervised Learning (SL), which suffered from various limitations. Deep Reinforcement Learning (DRL) offers to solve these drawbacks of SL approaches by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with its environment to make optimal decisions through trial and error. In this paper, a continuous action space approach is adopted to give the trading agent the ability to gradually adjust the portfolio's positions with each time step (dynamically re-allocate investments), resulting in better agent-environment interaction and faster convergence of the learning process. In addition, the approach supports the managing of a portfolio with several assets instead of a single one. This work represents a novel DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem, or what is referred to as The Agent Environment as Partially observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. More specifically, we design an environment that simulates the real-world trading process by augmenting the state representation with ten different technical indicators and sentiment analysis of news articles for each stock. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, which can learn policies in high-dimensional and continuous action spaces like those typically found in the stock market environment. From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of deep reinforcement learning in financial markets over other types of machine learning such as supervised learning and proves its credibility and advantages of strategic decision-making.

Keywords: the stock market, deep reinforcement learning, MDP, twin delayed deep deterministic policy gradient, sentiment analysis, technical indicators, autonomous agent

Procedia PDF Downloads 148

7323 The Student Care: The Influence of Family’s Attention toward the Student of Junior High Schools in Physics Learning Achievements

Authors: Siti Rossidatul Munawaroh, Siti Khusnul Khowatim

Abstract:

This study is determined to find how is the influence of family attention of students in provides guidance of the student learning. The increasing of student’s learning motivation can be increased made up in various ways, one of them are through students social guidance in their relation with the family. The family not only provides the matter and the learning time but also be supervise for the learning time and guide his children to overcome a learning disability. The character of physics subject in their science experiences at junior high schools has demanded that student’s ability is to think symbolically and understand something in a meaningful manner. Therefore, the reinforcement of the physics learning motivation is clearly necessary not only by the school are related, but the family environment and the society. As for the role of family which includes maintenance, parenting, coaching, and educating both of physically and spiritually, this way is expected to give spirit impulsion in studying physics subject in order to increase student learning achievements.

Keywords: physics subject, the influence of family attention, learning motivation, the Student care

Procedia PDF Downloads 397

7322 Applications of Evolutionary Optimization Methods in Reinforcement Learning

Authors: Rahul Paul, Kedar Nath Das

Abstract:

The paradigm of Reinforcement Learning (RL) has become prominent in training intelligent agents to make decisions in environments that are both dynamic and uncertain. The primary objective of RL is to optimize the policy of an agent in order to maximize the cumulative reward it receives throughout a given period. Nevertheless, the process of optimization presents notable difficulties as a result of the inherent trade-off between exploration and exploitation, the presence of extensive state-action spaces, and the intricate nature of the dynamics involved. Evolutionary Optimization Methods (EOMs) have garnered considerable attention as a supplementary approach to tackle these challenges, providing distinct capabilities for optimizing RL policies and value functions. The ongoing advancement of research in both RL and EOMs presents an opportunity for significant advancements in autonomous decision-making systems. The convergence of these two fields has the potential to have a transformative impact on various domains of artificial intelligence (AI) applications. This article highlights the considerable influence of EOMs in enhancing the capabilities of RL. Taking advantage of evolutionary principles enables RL algorithms to effectively traverse extensive action spaces and discover optimal solutions within intricate environments. Moreover, this paper emphasizes the practical implementations of EOMs in the field of RL, specifically in areas such as robotic control, autonomous systems, inventory problems, and multi-agent scenarios. The article highlights the utilization of EOMs in facilitating RL agents to effectively adapt, evolve, and uncover proficient strategies for complex tasks that may pose challenges for conventional RL approaches.

Keywords: machine learning, reinforcement learning, loss function, optimization techniques, evolutionary optimization methods

Procedia PDF Downloads 44

7321 Experimental and Analytical Study to Investigate the Effect of Tension Reinforcement on Behavior of Reinforced Concrete Short Beams

Authors: Hakan Ozturk, Aydin Demir, Kemal Edip, Marta Stojmanovska, Julijana Bojadjieva

Abstract:

There are many factors that affect the behavior of reinforced concrete beams. These can be listed as concrete compressive and reinforcement yield strength, amount of tension, compression and confinement bars, and strain hardening of reinforcement. In the study, support condition of short beams is selected statically indeterminate to first degree. Experimental and numerical analysis are carried for reinforcement concrete (RC) short beams. Dimensions of cross sections are selected as 250mm width and 500 mm height. The length of RC short beams is designed as 2250 mm and these values are constant in all beams. After verifying accurately finite element model, a numerical parametric study is performed with varied diameter of tension reinforcement. Effect of change in diameter is investigated on behavior of RC short beams. As a result of the study, ductility ratios and failure modes are determined, and load-displacement graphs are obtained in order to understand the behavior of short beams. It is deduced that diameter of tension reinforcement plays very important role on the behavior of RC short beams in terms of ductility and brittleness.

Keywords: short beam, reinforced concrete, finite element analysis, longitudinal reinforcement

Procedia PDF Downloads 180