Search results for: model-based reinforcement learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2274

Search results for: model-based reinforcement learning

2274 A Computer Model of Language Acquisition – Syllable Learning – Based on Hebbian Cell Assemblies and Reinforcement Learning

Authors: Sepideh Fazeli, Fariba Bahrami

Abstract:

Investigating language acquisition is one of the most challenging problems in the area of studying language. Syllable learning as a level of language acquisition has a considerable significance since it plays an important role in language acquisition. Because of impossibility of studying language acquisition directly with children, especially in its developmental phases, computer models will be useful in examining language acquisition. In this paper a computer model of early language learning for syllable learning is proposed. It is guided by a conceptual model of syllable learning which is named Directions Into Velocities of Articulators model (DIVA). The computer model uses simple associational and reinforcement learning rules within neural network architecture which are inspired by neuroscience. Our simulation results verify the ability of the proposed computer model in producing phonemes during babbling and early speech. Also, it provides a framework for examining the neural basis of language learning and communication disorders.

Keywords: Brain modeling, computer models, language acquisition, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589
2273 Distributed System Computing Resource Scheduling Algorithm Based on Deep Reinforcement Learning

Authors: Yitao Lei, Xingxiang Zhai, Burra Venkata Durga Kumar

Abstract:

As the quantity and complexity of computing in large-scale software systems increase, distributed system computing becomes increasingly important. The distributed system realizes high-performance computing by collaboration between different computing resources. If there are no efficient resource scheduling resources, the abuse of distributed computing may cause resource waste and high costs. However, resource scheduling is usually an NP-hard problem, so we cannot find a general solution. However, some optimization algorithms exist like genetic algorithm, ant colony optimization, etc. The large scale of distributed systems makes this traditional optimization algorithm challenging to work with. Heuristic and machine learning algorithms are usually applied in this situation to ease the computing load. As a result, we do a review of traditional resource scheduling optimization algorithms and try to introduce a deep reinforcement learning method that utilizes the perceptual ability of neural networks and the decision-making ability of reinforcement learning. Using the machine learning method, we try to find important factors that influence the performance of distributed system computing and help the distributed system do an efficient computing resource scheduling. This paper surveys the application of deep reinforcement learning on distributed system computing resource scheduling. The research proposes a deep reinforcement learning method that uses a recurrent neural network to optimize the resource scheduling. The paper concludes the challenges and improvement directions for Deep Reinforcement Learning-based resource scheduling algorithms.

Keywords: Resource scheduling, deep reinforcement learning, distributed system, artificial intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 495
2272 A Learning Agent for Knowledge Extraction from an Active Semantic Network

Authors: Simon Thiel, Stavros Dalakakis, Dieter Roller

Abstract:

This paper outlines the development of a learning retrieval agent. Task of this agent is to extract knowledge of the Active Semantic Network in respect to user-requests. Based on a reinforcement learning approach, the agent learns to interpret the user-s intention. Especially, the learning algorithm focuses on the retrieval of complex long distant relations. Increasing its learnt knowledge with every request-result-evaluation sequence, the agent enhances his capability in finding the intended information.

Keywords: Reinforcement learning, learning retrieval agent, search in semantic networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1493
2271 Robot Exploration and Navigation in Unseen Environments Using Deep Reinforcement Learning

Authors: Romisaa Ali

Abstract:

This paper presents a comparison between twin-delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC) reinforcement learning algorithms in the context of training robust navigation policies for Jackal robots. By leveraging an open-source framework and custom motion control environments, the study evaluates the performance, robustness, and transferability of the trained policies across a range of scenarios. The primary focus of the experiments is to assess the training process, the adaptability of the algorithms, and the robot’s ability to navigate in previously unseen environments. Moreover, the paper examines the influence of varying environment complexities on the learning process and the generalization capabilities of the resulting policies. The results of this study aim to inform and guide the development of more efficient and practical reinforcement learning-based navigation policies for Jackal robots in real-world scenarios.

Keywords: Jackal robot environments, reinforcement learning, TD3, SAC, robust navigation, transferability, Custom Environment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 63
2270 A Modular On-line Profit Sharing Approach in Multiagent Domains

Authors: Pucheng Zhou, Bingrong Hong

Abstract:

How to coordinate the behaviors of the agents through learning is a challenging problem within multi-agent domains. Because of its complexity, recent work has focused on how coordinated strategies can be learned. Here we are interested in using reinforcement learning techniques to learn the coordinated actions of a group of agents, without requiring explicit communication among them. However, traditional reinforcement learning methods are based on the assumption that the environment can be modeled as Markov Decision Process, which usually cannot be satisfied when multiple agents coexist in the same environment. Moreover, to effectively coordinate each agent-s behavior so as to achieve the goal, it-s necessary to augment the state of each agent with the information about other existing agents. Whereas, as the number of agents in a multiagent environment increases, the state space of each agent grows exponentially, which will cause the combinational explosion problem. Profit sharing is one of the reinforcement learning methods that allow agents to learn effective behaviors from their experiences even within non-Markovian environments. In this paper, to remedy the drawback of the original profit sharing approach that needs much memory to store each state-action pair during the learning process, we firstly address a kind of on-line rational profit sharing algorithm. Then, we integrate the advantages of modular learning architecture with on-line rational profit sharing algorithm, and propose a new modular reinforcement learning model. The effectiveness of the technique is demonstrated using the pursuit problem.

Keywords: Multi-agent learning; reinforcement learning; rationalprofit sharing; modular architecture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1445
2269 A Cognitive Robot Collaborative Reinforcement Learning Algorithm

Authors: Amit Gil, Helman Stern, Yael Edan

Abstract:

A cognitive collaborative reinforcement learning algorithm (CCRL) that incorporates an advisor into the learning process is developed to improve supervised learning. An autonomous learner is enabled with a self awareness cognitive skill to decide when to solicit instructions from the advisor. The learner can also assess the value of advice, and accept or reject it. The method is evaluated for robotic motion planning using simulation. Tests are conducted for advisors with skill levels from expert to novice. The CCRL algorithm and a combined method integrating its logic with Clouse-s Introspection Approach, outperformed a base-line fully autonomous learner, and demonstrated robust performance when dealing with various advisor skill levels, learning to accept advice received from an expert, while rejecting that of less skilled collaborators. Although the CCRL algorithm is based on RL, it fits other machine learning methods, since advisor-s actions are only added to the outer layer.

Keywords: Robot learning, human-robot collaboration, motion planning, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1723
2268 Biologically Inspired Controller for the Autonomous Navigation of a Mobile Robot in an Evasion Task

Authors: Dejanira Araiza-Illan, Tony J. Dodd

Abstract:

A novel biologically inspired controller for the autonomous navigation of a mobile robot in an evasion task is proposed. The controller takes advantage of the environment by calculating a measure of danger and subsequently choosing the parameters of a reinforcement learning based decision process. Two different reinforcement learning algorithms were used: Qlearning and Sarsa (λ). Simulations show that selecting dynamic parameters reduce the time while executing the decision making process, so the robot can obtain a policy to succeed in an escaping task in a realistic time.

Keywords: Autonomous navigation, mobile robots, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
2267 Personalized Email Marketing Strategy: A Reinforcement Learning Approach

Authors: Lei Zhang, Tingting Xu, Jun He, Zhenyu Yan, Roger Brooks

Abstract:

Email marketing is one of the most important segments of online marketing. Email content is vital to customers. Different customers may have different familiarity with a product, so a successful marketing strategy must personalize email content based on individual customers’ product affinity. In this study, we build our personalized email marketing strategy with three types of emails: nurture, promotion, and conversion. Each type of emails has a different influence on customers. We investigate this difference by analyzing customers’ open rates, click rates and opt-out rates. Feature importance from response models is also analyzed. The goal of the marketing strategy is to improve the click rate on conversion-type emails. To build the personalized strategy, we formulate the problem as a reinforcement learning problem and adopt a Q-learning algorithm with variations. The simulation results show that our model-based strategy outperforms the current marketer’s strategy.

Keywords: Email marketing, email content, reinforcement learning, machine learning, Q-learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 728
2266 Adaptive PID Controller based on Reinforcement Learning for Wind Turbine Control

Authors: M. Sedighizadeh, A. Rezazadeh

Abstract:

A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network is used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for WECS and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.

Keywords: Wind energy conversion systems, reinforcementlearning; Actor-Critic learning; adaptive PID control; RBF network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4934
2265 Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

Authors: Suraiya Jabin, Kamal K. Bharadwaj

Abstract:

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

Keywords: Hierarchical Production Rule, Data Mining, Learning Classifier System, Fuzzy Subsumption Relation, Subsumption matrix, Reinforcement Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1455
2264 On Dialogue Systems Based on Deep Learning

Authors: Yifan Fan, Xudong Luo, Pingping Lin

Abstract:

Nowadays, dialogue systems increasingly become the way for humans to access many computer systems. So, humans can interact with computers in natural language. A dialogue system consists of three parts: understanding what humans say in natural language, managing dialogue, and generating responses in natural language. In this paper, we survey deep learning based methods for dialogue management, response generation and dialogue evaluation. Specifically, these methods are based on neural network, long short-term memory network, deep reinforcement learning, pre-training and generative adversarial network. We compare these methods and point out the further research directions.

Keywords: Dialogue management, response generation, reinforcement learning, deep learning, evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 787
2263 Gaits Stability Analysis for a Pneumatic Quadruped Robot Using Reinforcement Learning

Authors: Soofiyan Atar, Adil Shaikh, Sahil Rajpurkar, Pragnesh Bhalala, Aniket Desai, Irfan Siddavatam

Abstract:

Deep reinforcement learning (deep RL) algorithms leverage the symbolic power of complex controllers by automating it by mapping sensory inputs to low-level actions. Deep RL eliminates the complex robot dynamics with minimal engineering. Deep RL provides high-risk involvement by directly implementing it in real-world scenarios and also high sensitivity towards hyperparameters. Tuning of hyperparameters on a pneumatic quadruped robot becomes very expensive through trial-and-error learning. This paper presents an automated learning control for a pneumatic quadruped robot using sample efficient deep Q learning, enabling minimal tuning and very few trials to learn the neural network. Long training hours may degrade the pneumatic cylinder due to jerk actions originated through stochastic weights. We applied this method to the pneumatic quadruped robot, which resulted in a hopping gait. In our process, we eliminated the use of a simulator and acquired a stable gait. This approach evolves so that the resultant gait matures more sturdy towards any stochastic changes in the environment. We further show that our algorithm performed very well as compared to programmed gait using robot dynamics.

Keywords: model-based reinforcement learning, gait stability, supervised learning, pneumatic quadruped

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 587
2262 Adhesion Performance According to Lateral Reinforcement Method of Textile

Authors: Jungbhin You, Taekyun Kim, Jongho Park, Sungnam Hong, Sun-Kyu Park

Abstract:

Reinforced concrete has been mainly used in construction field because of excellent durability. However, it may lead to reduction of durability and safety due to corrosion of reinforcement steels according to damage of concrete surface. Recently, research of textile is ongoing to complement weakness of reinforced concrete. In previous research, only experiment of longitudinal length were performed. Therefore, in order to investigate the adhesion performance according to the lattice shape and the embedded length, the pull-out test was performed on the roving with parameter of the number of lateral reinforcement, the lateral reinforcement length and the lateral reinforcement spacing. As a result, the number of lateral reinforcement and the lateral reinforcement length did not significantly affect the load variation depending on the adhesion performance, and only the load analysis results according to the reinforcement spacing are affected.

Keywords: Adhesion performance, lateral reinforcement, pull-out test, textile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1131
2261 A Reinforcement Learning Approach for Evaluation of Real-Time Disaster Relief Demand and Network Condition

Authors: Ali Nadi, Ali Edrissi

Abstract:

Relief demand and transportation links availability is the essential information that is needed for every natural disaster operation. This information is not in hand once a disaster strikes. Relief demand and network condition has been evaluated based on prediction method in related works. Nevertheless, prediction seems to be over or under estimated due to uncertainties and may lead to a failure operation. Therefore, in this paper a stochastic programming model is proposed to evaluate real-time relief demand and network condition at the onset of a natural disaster. To address the time sensitivity of the emergency response, the proposed model uses reinforcement learning for optimization of the total relief assessment time. The proposed model is tested on a real size network problem. The simulation results indicate that the proposed model performs well in the case of collecting real-time information.

Keywords: Disaster management, real-time demand, reinforcement learning, relief demand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1937
2260 A Probabilistic Reinforcement-Based Approach to Conceptualization

Authors: Hadi Firouzi, Majid Nili Ahmadabadi, Babak N. Araabi

Abstract:

Conceptualization strengthens intelligent systems in generalization skill, effective knowledge representation, real-time inference, and managing uncertain and indefinite situations in addition to facilitating knowledge communication for learning agents situated in real world. Concept learning introduces a way of abstraction by which the continuous state is formed as entities called concepts which are connected to the action space and thus, they illustrate somehow the complex action space. Of computational concept learning approaches, action-based conceptualization is favored because of its simplicity and mirror neuron foundations in neuroscience. In this paper, a new biologically inspired concept learning approach based on the probabilistic framework is proposed. This approach exploits and extends the mirror neuron-s role in conceptualization for a reinforcement learning agent in nondeterministic environments. In the proposed method, instead of building a huge numerical knowledge, the concepts are learnt gradually from rewards through interaction with the environment. Moreover the probabilistic formation of the concepts is employed to deal with uncertain and dynamic nature of real problems in addition to the ability of generalization. These characteristics as a whole distinguish the proposed learning algorithm from both a pure classification algorithm and typical reinforcement learning. Simulation results show advantages of the proposed framework in terms of convergence speed as well as generalization and asymptotic behavior because of utilizing both success and failures attempts through received rewards. Experimental results, on the other hand, show the applicability and effectiveness of the proposed method in continuous and noisy environments for a real robotic task such as maze as well as the benefits of implementing an incremental learning scenario in artificial agents.

Keywords: Concept learning, probabilistic decision making, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526
2259 Q-Learning with Eligibility Traces to Solve Non-Convex Economic Dispatch Problems

Authors: Mohammed I. Abouheaf, Sofie Haesaert, Wei-Jen Lee, Frank L. Lewis

Abstract:

Economic Dispatch is one of the most important power system management tools. It is used to allocate an amount of power generation to the generating units to meet the load demand. The Economic Dispatch problem is a large scale nonlinear constrained optimization problem. In general, heuristic optimization techniques are used to solve non-convex Economic Dispatch problem. In this paper, ideas from Reinforcement Learning are proposed to solve the non-convex Economic Dispatch problem. Q-Learning is a reinforcement learning techniques where each generating unit learn the optimal schedule of the generated power that minimizes the generation cost function. The eligibility traces are used to speed up the Q-Learning process. Q-Learning with eligibility traces is used to solve Economic Dispatch problems with valve point loading effect, multiple fuel options, and power transmission losses.

Keywords: Economic Dispatch, Non-Convex Cost Functions, Valve Point Loading Effect, Q-Learning, Eligibility Traces.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2087
2258 Deep Reinforcement Learning for Optimal Decision-making in Supply Chains

Authors: Nitin Singh, Meng Ling, Talha Ahmed, Tianxia Zhao, Reinier van de Pol

Abstract:

We propose the use of Reinforcement Learning (RL) as a viable alternative for optimizing supply chain management, particularly in scenarios with stochasticity in product demands. RL’s adaptability to changing conditions and its demonstrated success in diverse fields of sequential decision-making make it a promising candidate for addressing supply chain problems. We investigate the impact of demand fluctuations in a multi-product supply chain system and develop RL agents with learned generalizable policies. We provide experimentation details for training RL agents and a statistical analysis of the results. We study generalization ability of RL agents for different demand uncertainty scenarios and observe superior performance compared to the agents trained with fixed demand curves. The proposed methodology has the potential to lead to cost reduction and increased profit for companies dealing with frequent inventory movement between supply and demand nodes.

Keywords: Inventory Management, Reinforcement Learning, Supply Chain Optimization, Uncertainty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 382
2257 Robot Movement Using the Trust Region Policy Optimization

Authors: Romisaa Ali

Abstract:

The Policy Gradient approach is a subset of the Deep Reinforcement Learning (DRL) combines Deep Neural Networks (DNN) with Reinforcement Learning (RL). This approach finds the optimal policy of robot movement, based on the experience it gains from interaction with its environment. Unlike previous policy gradient algorithms, which were unable to handle the two types of error variance and bias introduced by the DNN model due to over- or underestimation, this algorithm is capable of handling both types of error variance and bias. This article will discuss the state-of-the-art SOTA policy gradient technique, trust region policy optimization (TRPO), by applying this method in various environments compared to another policy gradient method, the Proximal Policy Optimization (PPO), to explain their robust optimization, using this SOTA to gather experience data during various training phases after observing the impact of hyper-parameters on neural network performance.

Keywords: Deep neural networks, deep reinforcement learning, Proximal Policy Optimization, state-of-the-art, trust region policy optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 183
2256 The Effect of Geogrid Reinforcement Pre-Stressing on the Performance of Sand Bed Supporting a Strip Foundation

Authors: Ahmed M. Eltohamy

Abstract:

In this paper, an experimental and numerical study was adopted to investigate the effect geogrid soil reinforcement pre-stressing on the pressure settlement relation of sand bed supporting a strip foundation. The studied parameters include foundation depth and pre-stress ratio for the cases of one and two pre-stressed reinforcement layers. The study reflected that pre-stressing of soil reinforcement resulted in a marked enhancement in reinforced bed soil stiffness compared to the reinforced soil without pre-stress. The best benefit of pre-stressing reinforcement was obtained as the overburden pressure and pre-straining ratio increase. Pre-stressing of double reinforcement topmost layers results in further enhancement of stress strain relation of bed soil.

Keywords: Geogrid reinforcement, strip footing, pre-stress, bearing capacity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649
2255 Optimizing Dialogue Strategy Learning Using Learning Automata

Authors: G. Kumaravelan, R. Sivakumar

Abstract:

Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.

Keywords: Dialogue management, Learning automata, Reinforcement learning, Spoken dialogue system

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610
2254 Reinforcement Learning-Based Coexistence Interference Management in Wireless Body Area Networks

Authors: Izaz Ahmad, Farhatullah, Shahbaz Ali, Farhad Ali, Faiza, Hazrat Junaid, Farhan Zaid

Abstract:

Current trends in remote health monitoring to monetize on the Internet of Things applications have been raised in efficient and interference free communications in Wireless Body Area Network (WBAN) scenario. Co-existence interference in WBANs have aggravates the over-congested radio bands, thereby requiring efficient Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) strategies and improve interference management. Existing solutions utilize simplistic heuristics to approach interference problems. The scope of this research article is to investigate reinforcement learning for efficient interference management under co-existing scenarios with an emphasis on homogenous interferences. The aim of this paper is to suggest a smart CSMA/CA mechanism based on reinforcement learning called QIM-MAC that effectively uses sense slots with minimal interference. Simulation results are analyzed based on scenarios which show that the proposed approach maximized Average Network Throughput and Packet Delivery Ratio and minimized Packet Loss Ratio, Energy Consumption and Average Delay.

Keywords: WBAN, IEEE 802.15.4 Standard, CAP Super-frame, Q-Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 650
2253 Deep Reinforcement Learning Approach for Trading Automation in the Stock Market

Authors: Taylan Kabbani, Ekrem Duman

Abstract:

Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining  the financial assets price ”prediction” step and the ”allocation” step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with its environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solved the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and achieved a 2.68 Sharpe ratio on the test dataset. From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.

Keywords: Autonomous agent, deep reinforcement learning, MDP, sentiment analysis, stock market, technical indicators, twin delayed deep deterministic policy gradient.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 524
2252 Nonlinear Finite Element Modeling of Unbonded Steel Reinforced Concrete Beams

Authors: Fares Jnaid, Riyad Aboutaha

Abstract:

In this paper, a nonlinear Finite Element Analysis (FEA) was carried out using ANSYS software to build a model able of predicting the behavior of Reinforced Concrete (RC) beams with unbonded reinforcement. The FEA model was compared to existing experimental data by other researchers. The existing experimental data consisted of 16 beams that varied from structurally sound beams to beams with unbonded reinforcement with different unbonded lengths and reinforcement ratios. The model was able to predict the ultimate flexural strength, load-deflection curve, and crack pattern of concrete beams with unbonded reinforcement. It was concluded that when the when the unbonded length is less than 45% of the span, there will be no decrease in the ultimate flexural strength due to the loss of bond between the steel reinforcement and the surrounding concrete regardless of the reinforcement ratio. Moreover, when the reinforcement ratio is relatively low, there will be no decrease in ultimate flexural strength regardless of the length of unbond.

Keywords: FEA, ANSYS, Unbond, Strain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3238
2251 Acquiring Contour Following Behaviour in Robotics through Q-Learning and Image-based States

Authors: Carlos V. Regueiro, Jose E. Domenech, Roberto Iglesias, Jose L. Correa

Abstract:

In this work a visual and reactive contour following behaviour is learned by reinforcement. With artificial vision the environment is perceived in 3D, and it is possible to avoid obstacles that are invisible to other sensors that are more common in mobile robotics. Reinforcement learning reduces the need for intervention in behaviour design, and simplifies its adjustment to the environment, the robot and the task. In order to facilitate its generalisation to other behaviours and to reduce the role of the designer, we propose a regular image-based codification of states. Even though this is much more difficult, our implementation converges and is robust. Results are presented with a Pioneer 2 AT on a Gazebo 3D simulator.

Keywords: Image-based State Codification, Mobile Robotics, ReinforcementLearning, Visual Behaviour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1604
2250 Safe and Efficient Deep Reinforcement Learning Control Model: A Hydroponics Case Study

Authors: Almutasim Billa A. Alanazi, Hal S. Tharp

Abstract:

Safe performance and efficient energy consumption are essential factors for designing a control system. This paper presents a reinforcement learning (RL) model that can be applied to control applications to improve safety and reduce energy consumption. As hardware constraints and environmental disturbances are imprecise and unpredictable, conventional control methods may not always be effective in optimizing control designs. However, RL has demonstrated its value in several artificial intelligence (AI) applications, especially in the field of control systems. The proposed model intelligently monitors a system's success by observing the rewards from the environment, with positive rewards counting as a success when the controlled reference is within the desired operating zone. Thus, the model can determine whether the system is safe to continue operating based on the designer/user specifications, which can be adjusted as needed. Additionally, the controller keeps track of energy consumption to improve energy efficiency by enabling the idle mode when the controlled reference is within the desired operating zone, thus reducing the system energy consumption during the controlling operation. Water temperature control for a hydroponic system is taken as a case study for the RL model, adjusting the variance of disturbances to show the model’s robustness and efficiency. On average, the model showed safety improvement by up to 15% and energy efficiency improvements by 35%-40% compared to a traditional RL model.

Keywords: Control system, hydroponics, machine learning, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 207
2249 A Control Model for Improving Safety and Efficiency of Navigation System Based on Reinforcement Learning

Authors: Almutasim Billa A. Alanazi, Hal S. Tharp

Abstract:

Artificial Intelligence (AI), specifically Reinforcement Learning (RL), has proven helpful in many control path planning technologies by maximizing and enhancing their performance, such as navigation systems. Since it learns from experience by interacting with the environment to determine the optimal policy, the optimal policy takes the best action in a particular state, accounting for the long-term rewards. Most navigation systems focus primarily on "arriving faster," overlooking safety and efficiency while estimating the optimum path, as safety and efficiency are essential factors when planning for a long-distance journey. This paper represents an RL control model that proposes a control mechanism for improving navigation systems. Also, the model could be applied to other control path planning applications because it is adjustable and can accept different properties and parameters. However, the navigation system application has been taken as a case and evaluation study for the proposed model. The model utilized a Q-learning algorithm for training and updating the policy. It allows the agent to analyze the quality of an action made in the environment to maximize rewards. The model gives the ability to update rewards regularly based on safety and efficiency assessments, allowing the policy to consider the desired safety and efficiency benefits while making decisions, which improves the quality of the decisions taken for path planning compared to the conventional RL approaches.

Keywords: Artificial intelligence, control system, navigation systems, reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201
2248 Development of AA2024 Matrix Composites Reinforced with Micro Yttrium through Cold Compaction with Superior Mechanical Properties

Authors: C. H. S. Vidyasagar, D. B. Karunakar

Abstract:

In this present work, five different composite samples with AA2024 as matrix and varying amounts of yttrium (0.1-0.5 wt.%) as reinforcement are developed through cold compaction. The microstructures of the developed composite samples revealed that the yttrium reinforcement caused grain refinement up to 0.3 wt.% and beyond which the refinement is not effective. The microstructure revealed Al2Cu precipitation which strengthened the composite up to 0.3 wt.% yttrium reinforcement. Upon further increase in yttrium reinforcement, the intermetallics and the precipitation coarsen and their corresponding strengthening effect decreases. The mechanical characterization revealed that the composite sample reinforced with 0.3 wt.% yttrium showed highest mechanical properties like 82 HV of hardness, 276 MPa Ultimate Tensile Strength (UTS), 229 MPa Yield Strength (YS) and an elongation (EL) of 18.9% respectively. However, the relative density of the developed composites decreased with the increase in yttrium reinforcement.

Keywords: Mechanical properties, AA 2024 matrix, yttrium reinforcement, cold compaction, precipitation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 658
2247 Off-Policy Q-learning Technique for Intrusion Response in Network Security

Authors: Zheni S. Stefanova, Kandethody M. Ramachandran

Abstract:

With the increasing dependency on our computer devices, we face the necessity of adequate, efficient and effective mechanisms, for protecting our network. There are two main problems that Intrusion Detection Systems (IDS) attempt to solve. 1) To detect the attack, by analyzing the incoming traffic and inspect the network (intrusion detection). 2) To produce a prompt response when the attack occurs (intrusion prevention). It is critical creating an Intrusion detection model that will detect a breach in the system on time and also challenging making it provide an automatic and with an acceptable delay response at every single stage of the monitoring process. We cannot afford to adopt security measures with a high exploiting computational power, and we are not able to accept a mechanism that will react with a delay. In this paper, we will propose an intrusion response mechanism that is based on artificial intelligence, and more precisely, reinforcement learning techniques (RLT). The RLT will help us to create a decision agent, who will control the process of interacting with the undetermined environment. The goal is to find an optimal policy, which will represent the intrusion response, therefore, to solve the Reinforcement learning problem, using a Q-learning approach. Our agent will produce an optimal immediate response, in the process of evaluating the network traffic.This Q-learning approach will establish the balance between exploration and exploitation and provide a unique, self-learning and strategic artificial intelligence response mechanism for IDS.

Keywords: Intrusion prevention, network security, optimal policy, Q-learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1022
2246 Behavioral Analysis of Team Members in Virtual Organization based on Trust Dimension and Learning

Authors: Indiramma M., K. R. Anandakumar

Abstract:

Trust management and Reputation models are becoming integral part of Internet based applications such as CSCW, E-commerce and Grid Computing. Also the trust dimension is a significant social structure and key to social relations within a collaborative community. Collaborative Decision Making (CDM) is a difficult task in the context of distributed environment (information across different geographical locations) and multidisciplinary decisions are involved such as Virtual Organization (VO). To aid team decision making in VO, Decision Support System and social network analysis approaches are integrated. In such situations social learning helps an organization in terms of relationship, team formation, partner selection etc. In this paper we focus on trust learning. Trust learning is an important activity in terms of information exchange, negotiation, collaboration and trust assessment for cooperation among virtual team members. In this paper we have proposed a reinforcement learning which enhances the trust decision making capability of interacting agents during collaboration in problem solving activity. Trust computational model with learning that we present is adapted for best alternate selection of new project in the organization. We verify our model in a multi-agent simulation where the agents in the community learn to identify trustworthy members, inconsistent behavior and conflicting behavior of agents.

Keywords: Collaborative Decision making, Trust, Multi Agent System (MAS), Bayesian Network, Reinforcement Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
2245 Agent-based Simulation for Blood Glucose Control in Diabetic Patients

Authors: Sh. Yasini, M. B. Naghibi-Sistani, A. Karimpour

Abstract:

This paper employs a new approach to regulate the blood glucose level of type I diabetic patient under an intensive insulin treatment. The closed-loop control scheme incorporates expert knowledge about treatment by using reinforcement learning theory to maintain the normoglycemic average of 80 mg/dl and the normal condition for free plasma insulin concentration in severe initial state. The insulin delivery rate is obtained off-line by using Qlearning algorithm, without requiring an explicit model of the environment dynamics. The implementation of the insulin delivery rate, therefore, requires simple function evaluation and minimal online computations. Controller performance is assessed in terms of its ability to reject the effect of meal disturbance and to overcome the variability in the glucose-insulin dynamics from patient to patient. Computer simulations are used to evaluate the effectiveness of the proposed technique and to show its superiority in controlling hyperglycemia over other existing algorithms

Keywords: Insulin Delivery rate, Q-learning algorithm, Reinforcement learning, Type I diabetes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2197