Search results for: ensemble model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16347

Search results for: ensemble model

16287 Melanoma and Non-Melanoma, Skin Lesion Classification, Using a Deep Learning Model

Authors: Shaira L. Kee, Michael Aaron G. Sy, Myles Joshua T. Tan, Hezerul Abdul Karim, Nouar AlDahoul

Abstract:

Skin diseases are considered the fourth most common disease, with melanoma and non-melanoma skin cancer as the most common type of cancer in Caucasians. The alarming increase in Skin Cancer cases shows an urgent need for further research to improve diagnostic methods, as early diagnosis can significantly improve the 5-year survival rate. Machine Learning algorithms for image pattern analysis in diagnosing skin lesions can dramatically increase the accuracy rate of detection and decrease possible human errors. Several studies have shown the diagnostic performance of computer algorithms outperformed dermatologists. However, existing methods still need improvements to reduce diagnostic errors and generate efficient and accurate results. Our paper proposes an ensemble method to classify dermoscopic images into benign and malignant skin lesions. The experiments were conducted using the International Skin Imaging Collaboration (ISIC) image samples. The dataset contains 3,297 dermoscopic images with benign and malignant categories. The results show improvement in performance with an accuracy of 88% and an F1 score of 87%, outperforming other existing models such as support vector machine (SVM), Residual network (ResNet50), EfficientNetB0, EfficientNetB4, and VGG16.

Keywords: deep learning - VGG16 - efficientNet - CNN – ensemble – dermoscopic images - melanoma

Procedia PDF Downloads 51
16286 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 142
16285 Predicting Aggregation Propensity from Low-Temperature Conformational Fluctuations

Authors: Hamza Javar Magnier, Robin Curtis

Abstract:

There have been rapid advances in the upstream processing of protein therapeutics, which has shifted the bottleneck to downstream purification and formulation. Finding liquid formulations with shelf lives of up to two years is increasingly difficult for some of the newer therapeutics, which have been engineered for activity, but their formulations are often viscous, can phase separate, and have a high propensity for irreversible aggregation1. We explore means to develop improved predictive ability from a better understanding of how protein-protein interactions on formulation conditions (pH, ionic strength, buffer type, presence of excipients) and how these impact upon the initial steps in protein self-association and aggregation. In this work, we study the initial steps in the aggregation pathways using a minimal protein model based on square-well potentials and discontinuous molecular dynamics. The effect of model parameters, including range of interaction, stiffness, chain length, and chain sequence, implies that protein models fold according to various pathways. By reducing the range of interactions, the folding- and collapse- transition come together, and follow a single-step folding pathway from the denatured to the native state2. After parameterizing the model interaction-parameters, we developed an understanding of low-temperature conformational properties and fluctuations, and the correlation to the folding transition of proteins in isolation. The model fluctuations increase with temperature. We observe a low-temperature point, below which large fluctuations are frozen out. This implies that fluctuations at low-temperature can be correlated to the folding transition at the melting temperature. Because proteins “breath” at low temperatures, defining a native-state as a single structure with conserved contacts and a fixed three-dimensional structure is misleading. Rather, we introduce a new definition of a native-state ensemble based on our understanding of the core conservation, which takes into account the native fluctuations at low temperatures. This approach permits the study of a large range of length and time scales needed to link the molecular interactions to the macroscopically observed behaviour. In addition, these models studied are parameterized by fitting to experimentally observed protein-protein interactions characterized in terms of osmotic second virial coefficients.

Keywords: protein folding, native-ensemble, conformational fluctuation, aggregation

Procedia PDF Downloads 333
16284 An Intrusion Detection Systems Based on K-Means, K-Medoids and Support Vector Clustering Using Ensemble

Authors: A. Mohammadpour, Ebrahim Najafi Kajabad, Ghazale Ipakchi

Abstract:

Presently, computer networks’ security rise in importance and many studies have also been conducted in this field. By the penetration of the internet networks in different fields, many things need to be done to provide a secure industrial and non-industrial network. Fire walls, appropriate Intrusion Detection Systems (IDS), encryption protocols for information sending and receiving, and use of authentication certificated are among things, which should be considered for system security. The aim of the present study is to use the outcome of several algorithms, which cause decline in IDS errors, in the way that improves system security and prevents additional overload to the system. Finally, regarding the obtained result we can also detect the amount and percentage of more sub attacks. By running the proposed system, which is based on the use of multi-algorithmic outcome and comparing that by the proposed single algorithmic methods, we observed a 78.64% result in attack detection that is improved by 3.14% than the proposed algorithms.

Keywords: intrusion detection systems, clustering, k-means, k-medoids, SV clustering, ensemble

Procedia PDF Downloads 192
16283 Multi-Sensor Target Tracking Using Ensemble Learning

Authors: Bhekisipho Twala, Mantepu Masetshaba, Ramapulana Nkoana

Abstract:

Multiple classifier systems combine several individual classifiers to deliver a final classification decision. However, an increasingly controversial question is whether such systems can outperform the single best classifier, and if so, what form of multiple classifiers system yields the most significant benefit. Also, multi-target tracking detection using multiple sensors is an important research field in mobile techniques and military applications. In this paper, several multiple classifiers systems are evaluated in terms of their ability to predict a system’s failure or success for multi-sensor target tracking tasks. The Bristol Eden project dataset is utilised for this task. Experimental and simulation results show that the human activity identification system can fulfill requirements of target tracking due to improved sensors classification performances with multiple classifier systems constructed using boosting achieving higher accuracy rates.

Keywords: single classifier, ensemble learning, multi-target tracking, multiple classifiers

Procedia PDF Downloads 230
16282 Real-Time Radar Tracking Based on Nonlinear Kalman Filter

Authors: Milca F. Coelho, K. Bousson, Kawser Ahmed

Abstract:

To accurately track an aerospace vehicle in a time-critical situation and in a highly nonlinear environment, is one of the strongest interests within the aerospace community. The tracking is achieved by estimating accurately the state of a moving target, which is composed of a set of variables that can provide a complete status of the system at a given time. One of the main ingredients for a good estimation performance is the use of efficient estimation algorithms. A well-known framework is the Kalman filtering methods, designed for prediction and estimation problems. The success of the Kalman Filter (KF) in engineering applications is mostly due to the Extended Kalman Filter (EKF), which is based on local linearization. Besides its popularity, the EKF presents several limitations. To address these limitations and as a possible solution to tracking problems, this paper proposes the use of the Ensemble Kalman Filter (EnKF). Although the EnKF is being extensively used in the context of weather forecasting and it is being recognized for producing accurate and computationally effective estimation on systems with a very high dimension, it is almost unknown by the tracking community. The EnKF was initially proposed as an attempt to improve the error covariance calculation, which on the classic Kalman Filter is difficult to implement. Also, in the EnKF method the prediction and analysis error covariances have ensemble representations. These ensembles have sizes which limit the number of degrees of freedom, in a way that the filter error covariance calculations are a lot more practical for modest ensemble sizes. In this paper, a realistic simulation of a radar tracking was performed, where the EnKF was applied and compared with the Extended Kalman Filter. The results suggested that the EnKF is a promising tool for tracking applications, offering more advantages in terms of performance.

Keywords: Kalman filter, nonlinear state estimation, optimal tracking, stochastic environment

Procedia PDF Downloads 105
16281 Application of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and Multipoint Optimal Minimum Entropy Deconvolution in Railway Bearings Fault Diagnosis

Authors: Yao Cheng, Weihua Zhang

Abstract:

Although the measured vibration signal contains rich information on machine health conditions, the white noise interferences and the discrete harmonic coming from blade, shaft and mash make the fault diagnosis of rolling element bearings difficult. In order to overcome the interferences of useless signals, a new fault diagnosis method combining Complete Ensemble Empirical Mode Decomposition with adaptive noise (CEEMDAN) and Multipoint Optimal Minimum Entropy Deconvolution (MOMED) is proposed for the fault diagnosis of high-speed train bearings. Firstly, the CEEMDAN technique is applied to adaptively decompose the raw vibration signal into a series of finite intrinsic mode functions (IMFs) and a residue. Compared with Ensemble Empirical Mode Decomposition (EEMD), the CEEMDAN can provide an exact reconstruction of the original signal and a better spectral separation of the modes, which improves the accuracy of fault diagnosis. An effective sensitivity index based on the Pearson's correlation coefficients between IMFs and raw signal is adopted to select sensitive IMFs that contain bearing fault information. The composite signal of the sensitive IMFs is applied to further analysis of fault identification. Next, for propose of identifying the fault information precisely, the MOMED is utilized to enhance the periodic impulses in composite signal. As a non-iterative method, the MOMED has better deconvolution performance than the classical deconvolution methods such Minimum Entropy Deconvolution (MED) and Maximum Correlated Kurtosis Deconvolution (MCKD). Third, the envelope spectrum analysis is applied to detect the existence of bearing fault. The simulated bearing fault signals with white noise and discrete harmonic interferences are used to validate the effectiveness of the proposed method. Finally, the superiorities of the proposed method are further demonstrated by high-speed train bearing fault datasets measured from test rig. The analysis results indicate that the proposed method has strong practicability.

Keywords: bearing, complete ensemble empirical mode decomposition with adaptive noise, fault diagnosis, multipoint optimal minimum entropy deconvolution

Procedia PDF Downloads 339
16280 Breast Cancer Prediction Using Score-Level Fusion of Machine Learning and Deep Learning Models

Authors: Sam Khozama, Ali M. Mayya

Abstract:

Breast cancer is one of the most common types in women. Early prediction of breast cancer helps physicians detect cancer in its early stages. Big cancer data needs a very powerful tool to analyze and extract predictions. Machine learning and deep learning are two of the most efficient tools for predicting cancer based on textual data. In this study, we developed a fusion model of two machine learning and deep learning models. To obtain the final prediction, Long-Short Term Memory (LSTM) and ensemble learning with hyper parameters optimization are used, and score-level fusion is used. Experiments are done on the Breast Cancer Surveillance Consortium (BCSC) dataset after balancing and grouping the class categories. Five different training scenarios are used, and the tests show that the designed fusion model improved the performance by 3.3% compared to the individual models.

Keywords: machine learning, deep learning, cancer prediction, breast cancer, LSTM, fusion

Procedia PDF Downloads 131
16279 Solar Power Forecasting for the Bidding Zones of the Italian Electricity Market with an Analog Ensemble Approach

Authors: Elena Collino, Dario A. Ronzio, Goffredo Decimi, Maurizio Riva

Abstract:

The rapid increase of renewable energy in Italy is led by wind and solar installations. The 2017 Italian energy strategy foresees a further development of these sustainable technologies, especially solar. This fact has resulted in new opportunities, challenges, and different problems to deal with. The growth of renewables allows to meet the European requirements regarding energy and environmental policy, but these types of sources are difficult to manage because they are intermittent and non-programmable. Operationally, these characteristics can lead to instability on the voltage profile and increasing uncertainty on energy reserve scheduling. The increasing renewable production must be considered with more and more attention especially by the Transmission System Operator (TSO). The TSO, in fact, every day provides orders on energy dispatch, once the market outcome has been determined, on extended areas, defined mainly on the basis of power transmission limitations. In Italy, six market zone are defined: Northern-Italy, Central-Northern Italy, Central-Southern Italy, Southern Italy, Sardinia, and Sicily. An accurate hourly renewable power forecasting for the day-ahead on these extended areas brings an improvement both in terms of dispatching and reserve management. In this study, an operational forecasting tool of the hourly solar output for the six Italian market zones is presented, and the performance is analysed. The implementation is carried out by means of a numerical weather prediction model, coupled with a statistical post-processing in order to derive the power forecast on the basis of the meteorological projection. The weather forecast is obtained from the limited area model RAMS on the Italian territory, initialized with IFS-ECMWF boundary conditions. The post-processing calculates the solar power production with the Analog Ensemble technique (AN). This statistical approach forecasts the production using a probability distribution of the measured production registered in the past when the weather scenario looked very similar to the forecasted one. The similarity is evaluated for the components of the solar radiation: global (GHI), diffuse (DIF) and direct normal (DNI) irradiation, together with the corresponding azimuth and zenith solar angles. These are, in fact, the main factors that affect the solar production. Considering that the AN performance is strictly related to the length and quality of the historical data a training period of more than one year has been used. The training set is made by historical Numerical Weather Prediction (NWP) forecasts at 12 UTC for the GHI, DIF and DNI variables over the Italian territory together with corresponding hourly measured production for each of the six zones. The AN technique makes it possible to estimate the aggregate solar production in the area, without information about the technologic characteristics of the all solar parks present in each area. Besides, this information is often only partially available. Every day, the hourly solar power forecast for the six Italian market zones is made publicly available through a website.

Keywords: analog ensemble, electricity market, PV forecast, solar energy

Procedia PDF Downloads 124
16278 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 204
16277 Breast Cancer Detection Using Machine Learning Algorithms

Authors: Jiwan Kumar, Pooja, Sandeep Negi, Anjum Rouf, Amit Kumar, Naveen Lakra

Abstract:

In modern times where, health issues are increasing day by day, breast cancer is also one of them, which is very crucial and really important to find in the early stages. Doctors can use this model in order to tell their patients whether a cancer is not harmful (benign) or harmful (malignant). We have used the knowledge of machine learning in order to produce the model. we have used algorithms like Logistic Regression, Random forest, support Vector Classifier, Bayesian Network and Radial Basis Function. We tried to use the data of crucial parts and show them the results in pictures in order to make it easier for doctors. By doing this, we're making ML better at finding breast cancer, which can lead to saving more lives and better health care.

Keywords: Bayesian network, radial basis function, ensemble learning, understandable, data making better, random forest, logistic regression, breast cancer

Procedia PDF Downloads 11
16276 Cirrhosis Mortality Prediction as Classification using Frequent Subgraph Mining

Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride

Abstract:

In this work, we use machine learning and novel data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. To the best of our knowledge, this is the first work to apply modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning

Procedia PDF Downloads 108
16275 Predictive Modelling of Aircraft Component Replacement Using Imbalanced Learning and Ensemble Method

Authors: Dangut Maren David, Skaf Zakwan

Abstract:

Adequate monitoring of vehicle component in other to obtain high uptime is the goal of predictive maintenance, the major challenge faced by businesses in industries is the significant cost associated with a delay in service delivery due to system downtime. Most of those businesses are interested in predicting those problems and proactively prevent them in advance before it occurs, which is the core advantage of Prognostic Health Management (PHM) application. The recent emergence of industry 4.0 or industrial internet of things (IIoT) has led to the need for monitoring systems activities and enhancing system-to-system or component-to- component interactions, this has resulted to a large generation of data known as big data. Analysis of big data represents an increasingly important, however, due to complexity inherently in the dataset such as imbalance classification problems, it becomes extremely difficult to build a model with accurate high precision. Data-driven predictive modeling for condition-based maintenance (CBM) has recently drowned research interest with growing attention to both academics and industries. The large data generated from industrial process inherently comes with a different degree of complexity which posed a challenge for analytics. Thus, imbalance classification problem exists perversely in industrial datasets which can affect the performance of learning algorithms yielding to poor classifier accuracy in model development. Misclassification of faults can result in unplanned breakdown leading economic loss. In this paper, an advanced approach for handling imbalance classification problem is proposed and then a prognostic model for predicting aircraft component replacement is developed to predict component replacement in advanced by exploring aircraft historical data, the approached is based on hybrid ensemble-based method which improves the prediction of the minority class during learning, we also investigate the impact of our approach on multiclass imbalance problem. We validate the feasibility and effectiveness in terms of the performance of our approach using real-world aircraft operation and maintenance datasets, which spans over 7 years. Our approach shows better performance compared to other similar approaches. We also validate our approach strength for handling multiclass imbalanced dataset, our results also show good performance compared to other based classifiers.

Keywords: prognostics, data-driven, imbalance classification, deep learning

Procedia PDF Downloads 149
16274 Ensemble Sampler For Infinite-Dimensional Inverse Problems

Authors: Jeremie Coullon, Robert J. Webber

Abstract:

We introduce a Markov chain Monte Carlo (MCMC) sam-pler for infinite-dimensional inverse problems. Our sam-pler is based on the affine invariant ensemble sampler, which uses interacting walkers to adapt to the covariance structure of the target distribution. We extend this ensem-ble sampler for the first time to infinite-dimensional func-tion spaces, yielding a highly efficient gradient-free MCMC algorithm. Because our ensemble sampler does not require gradients or posterior covariance estimates, it is simple to implement and broadly applicable. In many Bayes-ian inverse problems, Markov chain Monte Carlo (MCMC) meth-ods are needed to approximate distributions on infinite-dimensional function spaces, for example, in groundwater flow, medical imaging, and traffic flow. Yet designing efficient MCMC methods for function spaces has proved challenging. Recent gradi-ent-based MCMC methods preconditioned MCMC methods, and SMC methods have improved the computational efficiency of functional random walk. However, these samplers require gradi-ents or posterior covariance estimates that may be challenging to obtain. Calculating gradients is difficult or impossible in many high-dimensional inverse problems involving a numerical integra-tor with a black-box code base. Additionally, accurately estimating posterior covariances can require a lengthy pilot run or adaptation period. These concerns raise the question: is there a functional sampler that outperforms functional random walk without requir-ing gradients or posterior covariance estimates? To address this question, we consider a gradient-free sampler that avoids explicit covariance estimation yet adapts naturally to the covariance struc-ture of the sampled distribution. This sampler works by consider-ing an ensemble of walkers and interpolating and extrapolating between walkers to make a proposal. This is called the affine in-variant ensemble sampler (AIES), which is easy to tune, easy to parallelize, and efficient at sampling spaces of moderate dimen-sionality (less than 20). The main contribution of this work is to propose a functional ensemble sampler (FES) that combines func-tional random walk and AIES. To apply this sampler, we first cal-culate the Karhunen–Loeve (KL) expansion for the Bayesian prior distribution, assumed to be Gaussian and trace-class. Then, we use AIES to sample the posterior distribution on the low-wavenumber KL components and use the functional random walk to sample the posterior distribution on the high-wavenumber KL components. Alternating between AIES and functional random walk updates, we obtain our functional ensemble sampler that is efficient and easy to use without requiring detailed knowledge of the target dis-tribution. In past work, several authors have proposed splitting the Bayesian posterior into low-wavenumber and high-wavenumber components and then applying enhanced sampling to the low-wavenumber components. Yet compared to these other samplers, FES is unique in its simplicity and broad applicability. FES does not require any derivatives, and the need for derivative-free sam-plers has previously been emphasized. FES also eliminates the requirement for posterior covariance estimates. Lastly, FES is more efficient than other gradient-free samplers in our tests. In two nu-merical examples, we apply FES to challenging inverse problems that involve estimating a functional parameter and one or more scalar parameters. We compare the performance of functional random walk, FES, and an alternative derivative-free sampler that explicitly estimates the posterior covariance matrix. We conclude that FES is the fastest available gradient-free sampler for these challenging and multimodal test problems.

Keywords: Bayesian inverse problems, Markov chain Monte Carlo, infinite-dimensional inverse problems, dimensionality reduction

Procedia PDF Downloads 127
16273 Seismic Perimeter Surveillance System (Virtual Fence) for Threat Detection and Characterization Using Multiple ML Based Trained Models in Weighted Ensemble Voting

Authors: Vivek Mahadev, Manoj Kumar, Neelu Mathur, Brahm Dutt Pandey

Abstract:

Perimeter guarding and protection of critical installations require prompt intrusion detection and assessment to take effective countermeasures. Currently, visual and electronic surveillance are the primary methods used for perimeter guarding. These methods can be costly and complicated, requiring careful planning according to the location and terrain. Moreover, these methods often struggle to detect stealthy and camouflaged insurgents. The object of the present work is to devise a surveillance technique using seismic sensors that overcomes the limitations of existing systems. The aim is to improve intrusion detection, assessment, and characterization by utilizing seismic sensors. Most of the similar systems have only two types of intrusion detection capability viz., human or vehicle. In our work we could even categorize further to identify types of intrusion activity such as walking, running, group walking, fence jumping, tunnel digging and vehicular movements. A virtual fence of 60 meters at GCNEP, Bahadurgarh, Haryana, India, was created by installing four underground geophones at a distance of 15 meters each. The signals received from these geophones are then processed to find unique seismic signatures called features. Various feature optimization and selection methodologies, such as LightGBM, Boruta, Random Forest, Logistics, Recursive Feature Elimination, Chi-2 and Pearson Ratio were used to identify the best features for training the machine learning models. The trained models were developed using algorithms such as supervised support vector machine (SVM) classifier, kNN, Decision Tree, Logistic Regression, Naïve Bayes, and Artificial Neural Networks. These models were then used to predict the category of events, employing weighted ensemble voting to analyze and combine their results. The models were trained with 1940 training events and results were evaluated with 831 test events. It was observed that using the weighted ensemble voting increased the efficiency of predictions. In this study we successfully developed and deployed the virtual fence using geophones. Since these sensors are passive, do not radiate any energy and are installed underground, it is impossible for intruders to locate and nullify them. Their flexibility, quick and easy installation, low costs, hidden deployment and unattended surveillance make such systems especially suitable for critical installations and remote facilities with difficult terrain. This work demonstrates the potential of utilizing seismic sensors for creating better perimeter guarding and protection systems using multiple machine learning models in weighted ensemble voting. In this study the virtual fence achieved an intruder detection efficiency of over 97%.

Keywords: geophone, seismic perimeter surveillance, machine learning, weighted ensemble method

Procedia PDF Downloads 38
16272 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling

Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal

Abstract:

It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.

Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability

Procedia PDF Downloads 266
16271 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius

Authors: Mina Adel Shokry Fahim, Jūratė Sužiedelytė Visockienė

Abstract:

With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realisation often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.

Keywords: air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter

Procedia PDF Downloads 24
16270 Multi Object Tracking for Predictive Collision Avoidance

Authors: Bruk Gebregziabher

Abstract:

The safe and efficient operation of Autonomous Mobile Robots (AMRs) in complex environments, such as manufacturing, logistics, and agriculture, necessitates accurate multiobject tracking and predictive collision avoidance. This paper presents algorithms and techniques for addressing these challenges using Lidar sensor data, emphasizing ensemble Kalman filter. The developed predictive collision avoidance algorithm employs the data provided by lidar sensors to track multiple objects and predict their velocities and future positions, enabling the AMR to navigate safely and effectively. A modification to the dynamic windowing approach is introduced to enhance the performance of the collision avoidance system. The overall system architecture encompasses object detection, multi-object tracking, and predictive collision avoidance control. The experimental results, obtained from both simulation and real-world data, demonstrate the effectiveness of the proposed methods in various scenarios, which lays the foundation for future research on global planners, other controllers, and the integration of additional sensors. This thesis contributes to the ongoing development of safe and efficient autonomous systems in complex and dynamic environments.

Keywords: autonomous mobile robots, multi-object tracking, predictive collision avoidance, ensemble Kalman filter, lidar sensors

Procedia PDF Downloads 55
16269 Gene Prediction in DNA Sequences Using an Ensemble Algorithm Based on Goertzel Algorithm and Anti-Notch Filter

Authors: Hamidreza Saberkari, Mousa Shamsi, Hossein Ahmadi, Saeed Vaali, , MohammadHossein Sedaaghi

Abstract:

In the recent years, using signal processing tools for accurate identification of the protein coding regions has become a challenge in bioinformatics. Most of the genomic signal processing methods is based on the period-3 characteristics of the nucleoids in DNA strands and consequently, spectral analysis is applied to the numerical sequences of DNA to find the location of periodical components. In this paper, a novel ensemble algorithm for gene selection in DNA sequences has been presented which is based on the combination of Goertzel algorithm and anti-notch filter (ANF). The proposed algorithm has many advantages when compared to other conventional methods. Firstly, it leads to identify the coding protein regions more accurate due to using the Goertzel algorithm which is tuned at the desired frequency. Secondly, faster detection time is achieved. The proposed algorithm is applied on several genes, including genes available in databases BG570 and HMR195 and their results are compared to other methods based on the nucleotide level evaluation criteria. Implementation results show the excellent performance of the proposed algorithm in identifying protein coding regions, specifically in identification of small-scale gene areas.

Keywords: protein coding regions, period-3, anti-notch filter, Goertzel algorithm

Procedia PDF Downloads 364
16268 Predicting Radioactive Waste Glass Viscosity, Density and Dissolution with Machine Learning

Authors: Joseph Lillington, Tom Gout, Mike Harrison, Ian Farnan

Abstract:

The vitrification of high-level nuclear waste within borosilicate glass and its incorporation within a multi-barrier repository deep underground is widely accepted as the preferred disposal method. However, for this to happen, any safety case will require validation that the initially localized radionuclides will not be considerably released into the near/far-field. Therefore, accurate mechanistic models are necessary to predict glass dissolution, and these should be robust to a variety of incorporated waste species and leaching test conditions, particularly given substantial variations across international waste-streams. Here, machine learning is used to predict glass material properties (viscosity, density) and glass leaching model parameters from large-scale industrial data. A variety of different machine learning algorithms have been compared to assess performance. Density was predicted solely from composition, whereas viscosity additionally considered temperature. To predict suitable glass leaching model parameters, a large simulated dataset was created by coupling MATLAB and the chemical reactive-transport code HYTEC, considering the state-of-the-art GRAAL model (glass reactivity in allowance of the alteration layer). The trained models were then subsequently applied to the large-scale industrial, experimental data to identify potentially appropriate model parameters. Results indicate that ensemble methods can accurately predict viscosity as a function of temperature and composition across all three industrial datasets. Glass density prediction shows reliable learning performance with predictions primarily being within the experimental uncertainty of the test data. Furthermore, machine learning can predict glass dissolution model parameters behavior, demonstrating potential value in GRAAL model development and in assessing suitable model parameters for large-scale industrial glass dissolution data.

Keywords: machine learning, predictive modelling, pattern recognition, radioactive waste glass

Procedia PDF Downloads 89
16267 The Power of the Proper Orthogonal Decomposition Method

Authors: Charles Lee

Abstract:

The Principal Orthogonal Decomposition (POD) technique has been used as a model reduction tool for many applications in engineering and science. In principle, one begins with an ensemble of data, called snapshots, collected from an experiment or laboratory results. The beauty of the POD technique is that when applied, the entire data set can be represented by the smallest number of orthogonal basis elements. It is the such capability that allows us to reduce the complexity and dimensions of many physical applications. Mathematical formulations and numerical schemes for the POD method will be discussed along with applications in NASA’s Deep Space Large Antenna Arrays, Satellite Image Reconstruction, Cancer Detection with DNA Microarray Data, Maximizing Stock Return, and Medical Imaging.

Keywords: reduced-order methods, principal component analysis, cancer detection, image reconstruction, stock portfolios

Procedia PDF Downloads 49
16266 Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain

Authors: Zachary Blanks, Solomon Sonya

Abstract:

Poaching presents a serious threat to endangered animal species, environment conservations, and human life. Additionally, some poaching activity has even been linked to supplying funds to support terrorist networks elsewhere around the world. Consequently, agencies dedicated to protecting wildlife habitats have a near intractable task of adequately patrolling an entire area (spanning several thousand kilometers) given limited resources, funds, and personnel at their disposal. Thus, agencies need predictive tools that are both high-performing and easily implementable by the user to help in learning how the significant features (e.g. animal population densities, topography, behavior patterns of the criminals within the area, etc) interact with each other in hopes of abating poaching. This research develops a classification model using machine learning algorithms to aid in forecasting future attacks that is both easy to train and performs well when compared to other models. In this research, we demonstrate how data imputation methods (specifically predictive mean matching, gradient boosting, and random forest multiple imputation) can be applied to analyze data and create significant predictions across a varied data set. Specifically, we apply these methods to improve the accuracy of adopted prediction models (Logistic Regression, Support Vector Machine, etc). Finally, we assess the performance of the model and the accuracy of our data imputation methods by learning on a real-world data set constituting four years of imputed data and testing on one year of non-imputed data. This paper provides three main contributions. First, we extend work done by the Teamcore and CREATE (Center for Risk and Economic Analysis of Terrorism Events) research group at the University of Southern California (USC) working in conjunction with the Department of Homeland Security to apply game theory and machine learning algorithms to develop more efficient ways of reducing poaching. This research introduces ensemble methods (Random Forests and Stochastic Gradient Boosting) and applies it to real-world poaching data gathered from the Ugandan rain forest park rangers. Next, we consider the effect of data imputation on both the performance of various algorithms and the general accuracy of the method itself when applied to a dependent variable where a large number of observations are missing. Third, we provide an alternate approach to predict the probability of observing poaching both by season and by month. The results from this research are very promising. We conclude that by using Stochastic Gradient Boosting to predict observations for non-commercial poaching by season, we are able to produce statistically equivalent results while being orders of magnitude faster in computation time and complexity. Additionally, when predicting potential poaching incidents by individual month vice entire seasons, boosting techniques produce a mean area under the curve increase of approximately 3% relative to previous prediction schedules by entire seasons.

Keywords: ensemble methods, imputation, machine learning, random forests, statistical analysis, stochastic gradient boosting, wildlife protection

Procedia PDF Downloads 263
16265 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 12
16264 Potential Climate Change Impacts on the Hydrological System of the Harvey River Catchment

Authors: Hashim Isam Jameel Al-Safi, P. Ranjan Sarukkalige

Abstract:

Climate change is likely to impact the Australian continent by changing the trends of rainfall, increasing temperature, and affecting the accessibility of water quantity and quality. This study investigates the possible impacts of future climate change on the hydrological system of the Harvey River catchment in Western Australia by using the conceptual modelling approach (HBV mode). Daily observations of rainfall and temperature and the long-term monthly mean potential evapotranspiration, from six weather stations, were available for the period (1961-2015). The observed streamflow data at Clifton Park gauging station for 33 years (1983-2015) in line with the observed climate variables were used to run, calibrate and validate the HBV-model prior to the simulation process. The calibrated model was then forced with the downscaled future climate signals from a multi-model ensemble of fifteen GCMs of the CMIP3 model under three emission scenarios (A2, A1B and B1) to simulate the future runoff at the catchment outlet. Two periods were selected to represent the future climate conditions including the mid (2046-2065) and late (2080-2099) of the 21st century. A control run, with the reference climate period (1981-2000), was used to represent the current climate status. The modelling outcomes show an evident reduction in the mean annual streamflow during the mid of this century particularly for the A1B scenario relative to the control run. Toward the end of the century, all scenarios show a relatively high reduction trends in the mean annual streamflow, especially the A1B scenario, compared to the control run. The decline in the mean annual streamflow ranged between 4-15% during the mid of the current century and 9-42% by the end of the century.

Keywords: climate change impact, Harvey catchment, HBV model, hydrological modelling, GCMs, LARS-WG

Procedia PDF Downloads 228
16263 A Sense of Belonging: Music Learning and School Connectedness

Authors: Johanna Gamboa-Kroesen

Abstract:

School connectedness, or the sense of belonging at school, is a critical factor in adolescent health, academic achievement, and socioemotional well-being. In educational research, the construct of the psychological sense of school membership is often referred to as school engagement, school bonding, or school attachment. While current research recognizes school connectedness as integral to a child’s mental health and academic success, many schools have yet to develop adequate interventions to promote a child’s overall sense of belonging at school. However, prior researches in music education indicates that, among other benefits, music classrooms may provide an environment where students feel they belong. While studies indicates that music learning environments, specifically performing ensemble learning environments, instill a sense of school connectedness and, more broadly, contribute to a student’s socio-emotional development, there has been inadequate research on how the actions of music teachers contribute to this phenomenon. The purpose of this study was to examine the relationship between school connectedness and music learning environments with middle school music students enrolled in a school-based music ensemble. In addition, the study aimed to provide a descriptive analysis of the instructional practices that music teachers use to promote an inclusive environment in their classrooms and an overall sense of belonging in their students. Using 191 student surveys of school membership, student reflective writings, 5 teacher interviews, and 10 classroom observations, this study examined the relationship between 7th and 8th-grade student-reported levels of connectedness within their school-based music ensemble and teacher instructional practice. The study found that students reported high levels of positive school membership within their music classes. Students who participate in school-based orchestra ensembles reported a positive change in emotional state during music instruction. In addition, evidence in this study found that music teachers use instructional practices to build connectedness through de-emphasizing competition and strengthening a student’s sense of relational value within their music learning experience. The findings offer implications for future music teacher instruction to create environments of inclusion, strengthen student-teacher relationships, and promote strategies that enhance student connection to school.

Keywords: music education, belonging, instructional practice, school connectedness

Procedia PDF Downloads 35
16262 A New Nonlinear State-Space Model and Its Application

Authors: Abdullah Eqal Al Mazrooei

Abstract:

In this work, a new nonlinear model will be introduced. The model is in the state-space form. The nonlinearity of this model is in the state equation where the state vector is multiplied by its self. This technique makes our model generalizes many famous models as Lotka-Volterra model and Lorenz model which have many applications in the real life. We will apply our new model to estimate the wind speed by using a new nonlinear estimator which suitable to work with our model.

Keywords: nonlinear systems, state-space model, Kronecker product, nonlinear estimator

Procedia PDF Downloads 655
16261 Coding Considerations for Standalone Molecular Dynamics Simulations of Atomistic Structures

Authors: R. O. Ocaya, J. J. Terblans

Abstract:

The laws of Newtonian mechanics allow ab-initio molecular dynamics to model and simulate particle trajectories in material science by defining a differentiable potential function. This paper discusses some considerations for the coding of ab-initio programs for simulation on a standalone computer and illustrates the approach by C language codes in the context of embedded metallic atoms in the face-centred cubic structure. The algorithms use velocity-time integration to determine particle parameter evolution for up to several thousands of particles in a thermodynamical ensemble. Such functions are reusable and can be placed in a redistributable header library file. While there are both commercial and free packages available, their heuristic nature prevents dissection. In addition, developing own codes has the obvious advantage of teaching techniques applicable to new problems.

Keywords: C language, molecular dynamics, simulation, embedded atom method

Procedia PDF Downloads 272
16260 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 248
16259 Educational Data Mining: The Case of the Department of Mathematics and Computing in the Period 2009-2018

Authors: Mário Ernesto Sitoe, Orlando Zacarias

Abstract:

University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.

Keywords: evasion and retention, cross-validation, bagging, stacking

Procedia PDF Downloads 54
16258 Assimilating Multi-Mission Satellites Data into a Hydrological Model

Authors: Mehdi Khaki, Ehsan Forootan, Joseph Awange, Michael Kuhn

Abstract:

Terrestrial water storage, as a source of freshwater, plays an important role in human lives. Hydrological models offer important tools for simulating and predicting water storages at global and regional scales. However, their comparisons with 'reality' are imperfect mainly due to a high level of uncertainty in input data and limitations in accounting for all complex water cycle processes, uncertainties of (unknown) empirical model parameters, as well as the absence of high resolution (both spatially and temporally) data. Data assimilation can mitigate this drawback by incorporating new sets of observations into models. In this effort, we use multi-mission satellite-derived remotely sensed observations to improve the performance of World-Wide Water Resources Assessment system (W3RA) hydrological model for estimating terrestrial water storages. For this purpose, we assimilate total water storage (TWS) data from the Gravity Recovery And Climate Experiment (GRACE) and surface soil moisture data from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) into W3RA. This is done to (i) improve model estimations of water stored in ground and soil moisture, and (ii) assess the impacts of each satellite of data (from GRACE and AMSR-E) and their combination on the final terrestrial water storage estimations. These data are assimilated into W3RA using the Ensemble Square-Root Filter (EnSRF) filtering technique over Mississippi Basin (the United States) and Murray-Darling Basin (Australia) between 2002 and 2013. In order to evaluate the results, independent ground-based groundwater and soil moisture measurements within each basin are used.

Keywords: data assimilation, GRACE, AMSR-E, hydrological model, EnSRF

Procedia PDF Downloads 253