Search results for: tree algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2743

Search results for: tree algorithms

2143 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 115
2142 A Bi-Objective Model to Optimize the Total Time and Idle Probability for Facility Location Problem Behaving as M/M/1/K Queues

Authors: Amirhossein Chambari

Abstract:

This article proposes a bi-objective model for the facility location problem subject to congestion (overcrowding). Motivated by implementations to locate servers in internet mirror sites, communication networks, one-server-systems, so on. This model consider for situations in which immobile (or fixed) service facilities are congested (or queued) by stochastic demand to behave as M/M/1/K queues. We consider for this problem two simultaneous perspectives; (1) Customers (desire to limit times of accessing and waiting for service) and (2) Service provider (desire to limit average facility idle-time). A bi-objective model is setup for facility location problem with two objective functions; (1) Minimizing sum of expected total traveling and waiting time (customers) and (2) Minimizing the average facility idle-time percentage (service provider). The proposed model belongs to the class of mixed-integer nonlinear programming models and the class of NP-hard problems. In addition, to solve the model, controlled elitist non-dominated sorting genetic algorithms (Controlled NSGA-II) and controlled elitist non-dominated ranking genetic algorithms (NRGA-I) are proposed. Furthermore, the two proposed metaheuristics algorithms are evaluated by establishing standard multiobjective metrics. Finally, the results are analyzed and some conclusions are given.

Keywords: bi-objective, facility location, queueing, controlled NSGA-II, NRGA-I

Procedia PDF Downloads 570
2141 Response of Six Organic Soil Media on the Germination, Seedling Vigor Performance of Jack Fruit Seeds in Chitwan Nepal

Authors: Birendra Kumar Bhattachan

Abstract:

Organic soil media plays an important role for seed germination, growing, and producing organic jack fruits as the source of food such as vitamin A, C, and others for human health. An experiment was conducted to find out the appropriate organic soil medias to induce germination and seedling vigor of jack fruit seeds at the farm of Agriculture and Forestry University (AFU) Chitwan Nepal during June 2022 to October 2022. The organic soil medias used as treatments were as 1. soil collected under the Molingia tree; 2. soil, FYM and RH (2:1;1); 3. soil, FYM (1:1); 4. sand, FYM and RH (2:1:1), 5, sand, soil, FYM and RH (1:1:1:1) and 6. sand, soil and RH (1:2:1) under Completely Randomized Design (CRD) with four replications. Significantly highest germination of 88% was induced by soil media, followed by media of soil and FYM (!:1) i.e. 63% and the media of soil, FYM and RH (2:1;1) and the least media was sand, soil, FYM and RH (1:1:1:) to induce germination of 28%. Significantly highest seedling length of 73 cm was produced by soil media followed by the media soil, sand, and RH (1:2:1), i.e. 72 cm and the media soil, sand, FYM, and RH (1:1:1:1) and the least media was soil, FYM and RH (2:1:1) to produce 62 cm seedling length, Similarly, significantly highest seedling vigor of 6257 was produced by soil media followed by the media soil and FYM (1:1) i.e. 4253 and the least was the media sand, soil, FYM and RH (1:1:1:1) to produce seedling vigor of1916. Based on this experiment, it was concluded that soil media collected under the Moringia tree could induce the highest germinating capacity of jack fruit seeds and then seedling vigor.

Keywords: jack fruit seed, soil media, farm yard manure, sand media, rice husk

Procedia PDF Downloads 184
2140 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 122
2139 Deep Reinforcement Learning for Advanced Pressure Management in Water Distribution Networks

Authors: Ahmed Negm, George Aggidis, Xiandong Ma

Abstract:

With the diverse nature of urban cities, customer demand patterns, landscape topologies or even seasonal weather trends; managing our water distribution networks (WDNs) has proved a complex task. These unpredictable circumstances manifest as pipe failures, intermittent supply and burst events thus adding to water loss, energy waste and increased carbon emissions. Whilst these events are unavoidable, advanced pressure management has proved an effective tool to control and mitigate them. Henceforth, water utilities have struggled with developing a real-time control method that is resilient when confronting the challenges of water distribution. In this paper we use deep reinforcement learning (DRL) algorithms as a novel pressure control strategy to minimise pressure violations and leakage under both burst and background leakage conditions. Agents based on asynchronous actor critic (A2C) and recurrent proximal policy optimisation (Recurrent PPO) were trained and compared to benchmarked optimisation algorithms (differential evolution, particle swarm optimisation. A2C manages to minimise leakage by 32.48% under burst conditions and 67.17% under background conditions which was the highest performance in the DRL algorithms. A2C and Recurrent PPO performed well in comparison to the benchmarks with higher processing speed and lower computational effort.

Keywords: deep reinforcement learning, pressure management, water distribution networks, leakage management

Procedia PDF Downloads 74
2138 A Dynamic Solution Approach for Heart Disease Prediction

Authors: Walid Moudani

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the coronary heart disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts’ knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: multi-classifier decisions tree, features reduction, dynamic programming, rough sets

Procedia PDF Downloads 401
2137 Advanced Combinatorial Method for Solving Complex Fault Trees

Authors: José de Jesús Rivero Oliva, Jesús Salomón Llanes, Manuel Perdomo Ojeda, Antonio Torres Valle

Abstract:

Combinatorial explosion is a common problem to both predominant methods for solving fault trees: Minimal Cut Set (MCS) approach and Binary Decision Diagram (BDD). High memory consumption impedes the complete solution of very complex fault trees. Only approximated non-conservative solutions are possible in these cases using truncation or other simplification techniques. The paper proposes a method (CSolv+) for solving complex fault trees, without any possibility of combinatorial explosion. Each individual MCS is immediately discarded after its contribution to the basic events importance measures and the Top gate Upper Bound Probability (TUBP) has been accounted. An estimation of the Top gate Exact Probability (TEP) is also provided. Therefore, running in a computer cluster, CSolv+ will guarantee the complete solution of complex fault trees. It was successfully applied to 40 fault trees from the Aralia fault trees database, performing the evaluation of the top gate probability, the 1000 Significant MCSs (SMCS), and the Fussell-Vesely, RRW and RAW importance measures for all basic events. The high complexity fault tree nus9601 was solved with truncation probabilities from 10-²¹ to 10-²⁷ just to limit the execution time. The solution corresponding to 10-²⁷ evaluated 3.530.592.796 MCSs in 3 hours and 15 minutes.

Keywords: system reliability analysis, probabilistic risk assessment, fault tree analysis, basic events importance measures

Procedia PDF Downloads 31
2136 Predicting Provider Service Time in Outpatient Clinics Using Artificial Intelligence-Based Models

Authors: Haya Salah, Srinivas Sharan

Abstract:

Healthcare facilities use appointment systems to schedule their appointments and to manage access to their medical services. With the growing demand for outpatient care, it is now imperative to manage physician's time effectively. However, high variation in consultation duration affects the clinical scheduler's ability to estimate the appointment duration and allocate provider time appropriately. Underestimating consultation times can lead to physician's burnout, misdiagnosis, and patient dissatisfaction. On the other hand, appointment durations that are longer than required lead to doctor idle time and fewer patient visits. Therefore, a good estimation of consultation duration has the potential to improve timely access to care, resource utilization, quality of care, and patient satisfaction. Although the literature on factors influencing consultation length abound, little work has done to predict it using based data-driven approaches. Therefore, this study aims to predict consultation duration using supervised machine learning algorithms (ML), which predicts an outcome variable (e.g., consultation) based on potential features that influence the outcome. In particular, ML algorithms learn from a historical dataset without explicitly being programmed and uncover the relationship between the features and outcome variable. A subset of the data used in this study has been obtained from the electronic medical records (EMR) of four different outpatient clinics located in central Pennsylvania, USA. Also, publicly available information on doctor's characteristics such as gender and experience has been extracted from online sources. This research develops three popular ML algorithms (deep learning, random forest, gradient boosting machine) to predict the treatment time required for a patient and conducts a comparative analysis of these algorithms with respect to predictive performance. The findings of this study indicate that ML algorithms have the potential to predict the provider service time with superior accuracy. While the current approach of experience-based appointment duration estimation adopted by the clinic resulted in a mean absolute percentage error of 25.8%, the Deep learning algorithm developed in this study yielded the best performance with a MAPE of 12.24%, followed by gradient boosting machine (13.26%) and random forests (14.71%). Besides, this research also identified the critical variables affecting consultation duration to be patient type (new vs. established), doctor's experience, zip code, appointment day, and doctor's specialty. Moreover, several practical insights are obtained based on the comparative analysis of the ML algorithms. The machine learning approach presented in this study can serve as a decision support tool and could be integrated into the appointment system for effectively managing patient scheduling.

Keywords: clinical decision support system, machine learning algorithms, patient scheduling, prediction models, provider service time

Procedia PDF Downloads 115
2135 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: decision tree, genetic algorithm, machine learning, software defect prediction

Procedia PDF Downloads 324
2134 Artificial Intelligence Models for Detecting Spatiotemporal Crop Water Stress in Automating Irrigation Scheduling: A Review

Authors: Elham Koohi, Silvio Jose Gumiere, Hossein Bonakdari, Saeid Homayouni

Abstract:

Water used in agricultural crops can be managed by irrigation scheduling based on soil moisture levels and plant water stress thresholds. Automated irrigation scheduling limits crop physiological damage and yield reduction. Knowledge of crop water stress monitoring approaches can be effective in optimizing the use of agricultural water. Understanding the physiological mechanisms of crop responding and adapting to water deficit ensures sustainable agricultural management and food supply. This aim could be achieved by analyzing and diagnosing crop characteristics and their interlinkage with the surrounding environment. Assessments of plant functional types (e.g., leaf area and structure, tree height, rate of evapotranspiration, rate of photosynthesis), controlling changes, and irrigated areas mapping. Calculating thresholds of soil water content parameters, crop water use efficiency, and Nitrogen status make irrigation scheduling decisions more accurate by preventing water limitations between irrigations. Combining Remote Sensing (RS), the Internet of Things (IoT), Artificial Intelligence (AI), and Machine Learning Algorithms (MLAs) can improve measurement accuracies and automate irrigation scheduling. This paper is a review structured by surveying about 100 recent research studies to analyze varied approaches in terms of providing high spatial and temporal resolution mapping, sensor-based Variable Rate Application (VRA) mapping, the relation between spectral and thermal reflectance and different features of crop and soil. The other objective is to assess RS indices formed by choosing specific reflectance bands and identifying the correct spectral band to optimize classification techniques and analyze Proximal Optical Sensors (POSs) to control changes. The innovation of this paper can be defined as categorizing evaluation methodologies of precision irrigation (applying the right practice, at the right place, at the right time, with the right quantity) controlled by soil moisture levels and sensitiveness of crops to water stress, into pre-processing, processing (retrieval algorithms), and post-processing parts. Then, the main idea of this research is to analyze the error reasons and/or values in employing different approaches in three proposed parts reported by recent studies. Additionally, as an overview conclusion tried to decompose different approaches to optimizing indices, calibration methods for the sensors, thresholding and prediction models prone to errors, and improvements in classification accuracy for mapping changes.

Keywords: agricultural crops, crop water stress detection, irrigation scheduling, precision agriculture, remote sensing

Procedia PDF Downloads 66
2133 An Approach to Autonomous Drones Using Deep Reinforcement Learning and Object Detection

Authors: K. R. Roopesh Bharatwaj, Avinash Maharana, Favour Tobi Aborisade, Roger Young

Abstract:

Presently, there are few cases of complete automation of drones and its allied intelligence capabilities. In essence, the potential of the drone has not yet been fully utilized. This paper presents feasible methods to build an intelligent drone with smart capabilities such as self-driving, and obstacle avoidance. It does this through advanced Reinforcement Learning Techniques and performs object detection using latest advanced algorithms, which are capable of processing light weight models with fast training in real time instances. For the scope of this paper, after researching on the various algorithms and comparing them, we finally implemented the Deep-Q-Networks (DQN) algorithm in the AirSim Simulator. In future works, we plan to implement further advanced self-driving and object detection algorithms, we also plan to implement voice-based speech recognition for the entire drone operation which would provide an option of speech communication between users (People) and the drone in the time of unavoidable circumstances. Thus, making drones an interactive intelligent Robotic Voice Enabled Service Assistant. This proposed drone has a wide scope of usability and is applicable in scenarios such as Disaster management, Air Transport of essentials, Agriculture, Manufacturing, Monitoring people movements in public area, and Defense. Also discussed, is the entire drone communication based on the satellite broadband Internet technology for faster computation and seamless communication service for uninterrupted network during disasters and remote location operations. This paper will explain the feasible algorithms required to go about achieving this goal and is more of a reference paper for future researchers going down this path.

Keywords: convolution neural network, natural language processing, obstacle avoidance, satellite broadband technology, self-driving

Procedia PDF Downloads 239
2132 Equity Risk Premiums and Risk Free Rates in Modelling and Prediction of Financial Markets

Authors: Mohammad Ghavami, Reza S. Dilmaghani

Abstract:

This paper presents an adaptive framework for modelling financial markets using equity risk premiums, risk free rates and volatilities. The recorded economic factors are initially used to train four adaptive filters for a certain limited period of time in the past. Once the systems are trained, the adjusted coefficients are used for modelling and prediction of an important financial market index. Two different approaches based on least mean squares (LMS) and recursive least squares (RLS) algorithms are investigated. Performance analysis of each method in terms of the mean squared error (MSE) is presented and the results are discussed. Computer simulations carried out using recorded data show MSEs of 4% and 3.4% for the next month prediction using LMS and RLS adaptive algorithms, respectively. In terms of twelve months prediction, RLS method shows a better tendency estimation compared to the LMS algorithm.

Keywords: adaptive methods, LSE, MSE, prediction of financial Markets

Procedia PDF Downloads 327
2131 Unified Coordinate System Approach for Swarm Search Algorithms in Global Information Deficit Environments

Authors: Rohit Dey, Sailendra Karra

Abstract:

This paper aims at solving the problem of multi-target searching in a Global Positioning System (GPS) denied environment using swarm robots with limited sensing and communication abilities. Typically, existing swarm-based search algorithms rely on the presence of a global coordinate system (vis-à-vis, GPS) that is shared by the entire swarm which, in turn, limits its application in a real-world scenario. This can be attributed to the fact that robots in a swarm need to share information among themselves regarding their location and signal from targets to decide their future course of action but this information is only meaningful when they all share the same coordinate frame. The paper addresses this very issue by eliminating any dependency of a search algorithm on the need of a predetermined global coordinate frame by the unification of the relative coordinate of individual robots when within the communication range, therefore, making the system more robust in real scenarios. Our algorithm assumes that all the robots in the swarm are equipped with range and bearing sensors and have limited sensing range and communication abilities. Initially, every robot maintains their relative coordinate frame and follow Levy walk random exploration until they come in range with other robots. When two or more robots are within communication range, they share sensor information and their location w.r.t. their coordinate frames based on which we unify their coordinate frames. Now they can share information about the areas that were already explored, information about the surroundings, and target signal from their location to make decisions about their future movement based on the search algorithm. During the process of exploration, there can be several small groups of robots having their own coordinate systems but eventually, it is expected for all the robots to be under one global coordinate frame where they can communicate information on the exploration area following swarm search techniques. Using the proposed method, swarm-based search algorithms can work in a real-world scenario without GPS and any initial information about the size and shape of the environment. Initial simulation results show that running our modified-Particle Swarm Optimization (PSO) without global information we can still achieve the desired results that are comparable to basic PSO working with GPS. In the full paper, we plan on doing the comparison study between different strategies to unify the coordinate system and to implement them on other bio-inspired algorithms, to work in GPS denied environment.

Keywords: bio-inspired search algorithms, decentralized control, GPS denied environment, swarm robotics, target searching, unifying coordinate systems

Procedia PDF Downloads 128
2130 Analysis of Biomarkers Intractable Epileptogenic Brain Networks with Independent Component Analysis and Deep Learning Algorithms: A Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients

Authors: Bliss Singhal

Abstract:

Epilepsy is a prevalent neurological disorder affecting approximately 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms’ performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.

Keywords: intractable epilepsy, seizure, deep learning, prediction, electroencephalogram channels

Procedia PDF Downloads 74
2129 Face Recognition Using Body-Worn Camera: Dataset and Baseline Algorithms

Authors: Ali Almadan, Anoop Krishnan, Ajita Rattani

Abstract:

Facial recognition is a widely adopted technology in surveillance, border control, healthcare, banking services, and lately, in mobile user authentication with Apple introducing “Face ID” moniker with iPhone X. A lot of research has been conducted in the area of face recognition on datasets captured by surveillance cameras, DSLR, and mobile devices. Recently, face recognition technology has also been deployed on body-worn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic so far, without the availability of any publicly available datasets with a sufficient sample size. This paper aims to advance research in the area of face recognition using body-worn cameras. To this aim, the contribution of this work is two-fold: (1) collection of a dataset consisting of a total of 136,939 facial images of 102 subjects captured using body-worn cameras in in-door and daylight conditions and (2) evaluation of various deep-learning architectures for face identification on the collected dataset. Experimental results suggest a maximum True Positive Rate(TPR) of 99.86% at False Positive Rate(FPR) of 0.000 obtained by SphereFace based deep learning architecture in daylight condition. The collected dataset and the baseline algorithms will promote further research and development. A downloadable link of the dataset and the algorithms is available by contacting the authors.

Keywords: face recognition, body-worn cameras, deep learning, person identification

Procedia PDF Downloads 157
2128 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles

Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi

Abstract:

Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.

Keywords: artificial neural networks, fuel consumption, friedman test, machine learning, statistical hypothesis testing

Procedia PDF Downloads 171
2127 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 149
2126 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 130
2125 Application of Adaptive Neural Network Algorithms for Determination of Salt Composition of Waters Using Laser Spectroscopy

Authors: Tatiana A. Dolenko, Sergey A. Burikov, Alexander O. Efitorov, Sergey A. Dolenko

Abstract:

In this study, a comparative analysis of the approaches associated with the use of neural network algorithms for effective solution of a complex inverse problem – the problem of identifying and determining the individual concentrations of inorganic salts in multicomponent aqueous solutions by the spectra of Raman scattering of light – is performed. It is shown that application of artificial neural networks provides the average accuracy of determination of concentration of each salt no worse than 0.025 M. The results of comparative analysis of input data compression methods are presented. It is demonstrated that use of uniform aggregation of input features allows decreasing the error of determination of individual concentrations of components by 16-18% on the average.

Keywords: inverse problems, multi-component solutions, neural networks, Raman spectroscopy

Procedia PDF Downloads 519
2124 Documents Emotions Classification Model Based on TF-IDF Weighting Measure

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Emotions classification of text documents is applied to reveal if the document expresses a determined emotion from its writer. As different supervised methods are previously used for emotion documents’ classification, in this research we present a novel model that supports the classification algorithms for more accurate results by the support of TF-IDF measure. Different experiments have been applied to reveal the applicability of the proposed model, the model succeeds in raising the accuracy percentage according to the determined metrics (precision, recall, and f-measure) based on applying the refinement of the lexicon, integration of lexicons using different perspectives, and applying the TF-IDF weighting measure over the classifying features. The proposed model has also been compared with other research to prove its competence in raising the results’ accuracy.

Keywords: emotion detection, TF-IDF, WEKA tool, classification algorithms

Procedia PDF Downloads 469
2123 Vibroacoustic Modulation with Chirp Signal

Authors: Dong Liu

Abstract:

By sending a high-frequency probe wave and a low-frequency pump wave to a specimen, the vibroacoustic method evaluates the defect’s severity according to the modulation index of the received signal. Many studies experimentally proved the significant sensitivity of the modulation index to the tiny contact type defect. However, it has also been found that the modulation index was highly affected by the frequency of probe or pump waves. Therefore, the chirp signal has been introduced to the VAM method since it can assess multiple frequencies in a relatively short time duration, so the robustness of the VAM method could be enhanced. Consequently, the signal processing method needs to be modified accordingly. Various studies utilized different algorithms or combinations of algorithms for processing the VAM signal method by chirp excitation. These signal process methods were compared and used for processing a VAM signal acquired from the steel samples.

Keywords: vibroacoustic modulation, nonlinear acoustic modulation, nonlinear acoustic NDT&E, signal processing, structural health monitoring

Procedia PDF Downloads 91
2122 Business and Psychological Principles Integrated into Automated Capital Investment Systems through Mathematical Algorithms

Authors: Cristian Pauna

Abstract:

With few steps away from the 2020, investments in financial markets is a common activity nowadays. In the electronic trading environment, the automated investment software has become a major part in the business intelligence system of any modern financial company. The investment decisions are assisted and/or made automatically by computers using mathematical algorithms today. The complexity of these algorithms requires computer assistance in the investment process. This paper will present several investment strategies that can be automated with algorithmic trading for Deutscher Aktienindex DAX30. It was found that, based on several price action mathematical models used for high-frequency trading some investment strategies can be optimized and improved for automated investments with good results. This paper will present the way to automate these investment decisions. Automated signals will be built using all of these strategies. Three major types of investment strategies were found in this study. The types are separated by the target length and by the exit strategy used. The exit decisions will be also automated and the paper will present the specificity for each investment type. A comparative study will be also included in this paper in order to reveal the differences between strategies. Based on these results, the profit and the capital exposure will be compared and analyzed in order to qualify the investment methodologies presented and to compare them with any other investment system. As conclusion, some major investment strategies will be revealed and compared in order to be considered for inclusion in any automated investment system.

Keywords: Algorithmic trading, automated investment systems, limit conditions, trading principles, trading strategies

Procedia PDF Downloads 187
2121 Intellectual Property in Digital Environment

Authors: Balamurugan L.

Abstract:

Artificial intelligence (AI) and its applications in Intellectual Property Rights (IPR) has been significantly growing in recent years. In last couple of years, AI tools for Patent Research and Patent Analytics have been well-stabilized in terms of accuracy of references and representation of identified patent insights. However, AI tools for Patent Prosecution and Patent Litigation are still in the nascent stage and there may be a significant potential if such market is explored further. Our research is primarily focused on identifying potential whitespaces and schematic algorithms to automate the Patent Prosecution and Patent Litigation Process of the Intellectual Property. The schematic algorithms may assist leading AI tool developers, to explore such opportunities in the field of Intellectual Property. Our research is also focused on identification of pitfalls of the AI. For example, Information Security and its impact in IPR, and Potential remediations to sustain the IPR in the digital environment.

Keywords: artificial intelligence, patent analytics, patent drafting, patent litigation, patent prosecution, patent research

Procedia PDF Downloads 58
2120 Production and Characterization of Biochars from Torrefaction of Biomass

Authors: Serdar Yaman, Hanzade Haykiri-Acma

Abstract:

Biomass is a CO₂-neutral fuel that is renewable and sustainable along with having very huge global potential. Efficient use of biomass in power generation and production of biomass-based biofuels can mitigate the greenhouse gasses (GHG) and reduce dependency on fossil fuels. There are also other beneficial effects of biomass energy use such as employment creation and pollutant reduction. However, most of the biomass materials are not capable of competing with fossil fuels in terms of energy content. High moisture content and high volatile matter yields of biomass make it low calorific fuel, and it is very significant concern over fossil fuels. Besides, the density of biomass is generally low, and it brings difficulty in transportation and storage. These negative aspects of biomass can be overcome by thermal pretreatments that upgrade the fuel property of biomass. That is, torrefaction is such a thermal process in which biomass is heated up to 300ºC under non-oxidizing conditions to avoid burning of the material. The treated biomass is called as biochar that has considerably lower contents of moisture, volatile matter, and oxygen compared to the parent biomass. Accordingly, carbon content and the calorific value of biochar increase to the level which is comparable with that of coal. Moreover, hydrophilic nature of untreated biomass that leads decay in the structure is mostly eliminated, and the surface properties of biochar turn into hydrophobic character upon torrefaction. In order to investigate the effectiveness of torrefaction process on biomass properties, several biomass species such as olive milling residue (OMR), Rhododendron (small shrubby tree with bell-shaped flowers), and ash tree (timber tree) were chosen. The fuel properties of these biomasses were analyzed through proximate and ultimate analyses as well as higher heating value (HHV) determination. For this, samples were first chopped and ground to a particle size lower than 250 µm. Then, samples were subjected to torrefaction in a horizontal tube furnace by heating from ambient up to temperatures of 200, 250, and 300ºC at a heating rate of 10ºC/min. The biochars obtained from this process were also tested by the methods applied to the parent biomass species. Improvement in the fuel properties was interpreted. That is, increasing torrefaction temperature led to regular increases in the HHV in OMR, and the highest HHV (6065 kcal/kg) was gained at 300ºC. Whereas, torrefaction at 250ºC was seen optimum for Rhododendron and ash tree since torrefaction at 300ºC had a detrimental effect on HHV. On the other hand, the increase in carbon contents and reduction in oxygen contents were determined. Burning characteristics of the biochars were also studied using thermal analysis technique. For this purpose, TA Instruments SDT Q600 model thermal analyzer was used and the thermogravimetric analysis (TGA), derivative thermogravimetry (DTG), differential scanning calorimetry (DSC), and differential thermal analysis (DTA) curves were compared and interpreted. It was concluded that torrefaction is an efficient method to upgrade the fuel properties of biomass and the biochars from which have superior characteristics compared to the parent biomasses.

Keywords: biochar, biomass, fuel upgrade, torrefaction

Procedia PDF Downloads 365
2119 An Algorithm to Depreciate the Energy Utilization Using a Bio-Inspired Method in Wireless Sensor Network

Authors: Navdeep Singh Randhawa, Shally Sharma

Abstract:

Wireless Sensor Network is an autonomous technology emanating in the current scenario at a fast pace. This technology faces a number of defiance’s and energy management is one of them, which has a huge impact on the network lifetime. To sustain energy the different types of routing protocols have been flourished. The classical routing protocols are no more compatible to perform in complicated environments. Hence, in the field of routing the intelligent algorithms based on nature systems is a turning point in Wireless Sensor Network. These nature-based algorithms are quite efficient to handle the challenges of the WSN as they are capable of achieving local and global best optimization solutions for the complex environments. So, the main attention of this paper is to develop a routing algorithm based on some swarm intelligent technique to enhance the performance of Wireless Sensor Network.

Keywords: wireless sensor network, routing, swarm intelligence, MPRSO

Procedia PDF Downloads 347
2118 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.

Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis

Procedia PDF Downloads 353
2117 Reconstruction of Age-Related Generations of Siberian Larch to Quantify the Climatogenic Dynamics of Woody Vegetation Close the Upper Limit of Its Growth

Authors: A. P. Mikhailovich, V. V. Fomin, E. M. Agapitov, V. E. Rogachev, E. A. Kostousova, E. S. Perekhodova

Abstract:

Woody vegetation among the upper limit of its habitat is a sensitive indicator of biota reaction to regional climate changes. Quantitative assessment of temporal and spatial changes in the distribution of trees and plant biocenoses calls for the development of new modeling approaches based upon selected data from measurements on the ground level and ultra-resolution aerial photography. Statistical models were developed for the study area located in the Polar Urals. These models allow obtaining probabilistic estimates for placing Siberian Larch trees into one of the three age intervals, namely 1-10, 11-40 and over 40 years, based on the Weilbull distribution of the maximum horizontal crown projection. Authors developed the distribution map for larch trees with crown diameters exceeding twenty centimeters by deciphering aerial photographs made by a UAV from an altitude equal to fifty meters. The total number of larches was equal to 88608, forming the following distribution row across the abovementioned intervals: 16980, 51740, and 19889 trees. The results demonstrate that two processes can be observed in the course of recent decades: first is the intensive forestation of previously barren or lightly wooded fragments of the study area located within the patches of wood, woodlands, and sparse stand, and second, expansion into mountain tundra. The current expansion of the Siberian Larch in the region replaced the depopulation process that occurred in the course of the Little Ice Age from the late 13ᵗʰ to the end of the 20ᵗʰ century. Using data from field measurements of Siberian larch specimen biometric parameters (including height, diameter at root collar and at 1.3 meters, and maximum projection of the crown in two orthogonal directions) and data on tree ages obtained at nine circular test sites, authors developed a model for artificial neural network including two layers with three and two neurons, respectively. The model allows quantitative assessment of a specimen's age based on height and maximum crone projection values. Tree height and crown diameters can be quantitatively assessed using data from aerial photographs and lidar scans. The resulting model can be used to assess the age of all Siberian larch trees. The proposed approach, after validation, can be applied to assessing the age of other tree species growing near the upper tree boundaries in other mountainous regions. This research was collaboratively funded by the Russian Ministry for Science and Education (project No. FEUG-2023-0002) and Russian Science Foundation (project No. 24-24-00235) in the field of data modeling on the basis of artificial neural network.

Keywords: treeline, dynamic, climate, modeling

Procedia PDF Downloads 52
2116 Designing Floor Planning in 2D and 3D with an Efficient Topological Structure

Authors: V. Nagammai

Abstract:

Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining thousands of transistors into a single chip. Development of technology increases the complexity in IC manufacturing which may vary the power consumption, increase the size and latency period. Topology defines a number of connections between network. In this project, NoC topology is generated using atlas tool which will increase performance in turn determination of constraints are effective. The routing is performed by XY routing algorithm and wormhole flow control. In NoC topology generation, the value of power, area and latency are predetermined. In previous work, placement, routing and shortest path evaluation is performed using an algorithm called floor planning with cluster reconstruction and path allocation algorithm (FCRPA) with the account of 4 3x3 switch, 6 4x4 switch, and 2 5x5 switches. The usage of the 4x4 and 5x5 switch will increase the power consumption and area of the block. In order to avoid the problem, this paper has used one 8x8 switch and 4 3x3 switches. This paper uses IPRCA which of 3 steps they are placement, clustering, and shortest path evaluation. The placement is performed using min – cut placement and clustering are performed using an algorithm called cluster generation. The shortest path is evaluated using an algorithm called Dijkstra's algorithm. The power consumption of each block is determined. The experimental result shows that the area, power, and wire length improved simultaneously.

Keywords: application specific noc, b* tree representation, floor planning, t tree representation

Procedia PDF Downloads 387
2115 Internet of Things: Route Search Optimization Applying Ant Colony Algorithm and Theory of Computer Science

Authors: Tushar Bhardwaj

Abstract:

Internet of Things (IoT) possesses a dynamic network where the network nodes (mobile devices) are added and removed constantly and randomly, hence the traffic distribution in the network is quite variable and irregular. The basic but very important part in any network is route searching. We have many conventional route searching algorithms like link-state, and distance vector algorithms but they are restricted to the static point to point network topology. In this paper we propose a model that uses the Ant Colony Algorithm for route searching. It is dynamic in nature and has positive feedback mechanism that conforms to the route searching. We have also embedded the concept of Non-Deterministic Finite Automata [NDFA] minimization to reduce the network to increase the performance. Results show that Ant Colony Algorithm gives the shortest path from the source to destination node and NDFA minimization reduces the broadcasting storm effectively.

Keywords: routing, ant colony algorithm, NDFA, IoT

Procedia PDF Downloads 435
2114 The Cell Viability Study of Extracts of Bark, Flowers, Leaves and Seeds of Indian Dhak Tree, Flame of Forest

Authors: Madhavi S. Apte, Milind Bhitre

Abstract:

In pharmaceutical research and new drug development, medicinal plants have important roles. Similarly, Indian dhak tree belonging to family Fabaceae has been widely used in the traditional Indian medical system of ‘Ayurveda’ for the treatment of a variety of ailments. Hence the cell viability study was undertaken to evaluate and compare the activity of extracts of various parts like flower, bark, leaf, seed by conducting MTT assay method along with other pharmacognostical studies. The methanolic extracts of bark, flowers, leaves, and seeds were used for the study. The cell viability MTT assay was performed using the standard operating procedures. The extracts were dissolved in DMSO and serially diluted with complete medium to get the concentrations range of test concentration. DMSO concentration was kept < 0.1% in all the samples. HUVEC cells maintained in appropriate conditions were seeded in 96 well plates and treated with different concentrations of the test samples and incubated at 37°C, 5% CO₂ for 96 hours. MTT reagent was added to the wells and incubated for 4 hours; the dark blue formazan product formed by the cells was dissolved in DMSO under a safety cabinet and read at 550nm. Percentage inhibitions were calculated and plotted with the concentrations used to calculate the IC50 values. The bark, flower, leaves and seed extracts have shown the cytotoxicity activity and can be further studied for antiangiogenesis activity.

Keywords: pharmacognosy, Cell viability, MTT assay, anti-angiogenesis

Procedia PDF Downloads 285