Search results for: central machine learning
10755 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation
Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen
Abstract:
Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning
Procedia PDF Downloads 7410754 Online Learning Versus Face to Face Learning: A Sentiment Analysis on General Education Mathematics in the Modern World of University of San Carlos School of Arts and Sciences Students Using Natural Language Processing
Authors: Derek Brandon G. Yu, Clyde Vincent O. Pilapil, Christine F. Peña
Abstract:
College students of Cebu province have been indoors since March 2020, and a challenge encountered is the sudden shift from face to face to online learning and with the lack of empirical data on online learning on Higher Education Institutions (HEIs) in the Philippines. Sentiments on face to face and online learning will be collected from University of San Carlos (USC), School of Arts and Sciences (SAS) students regarding Mathematics in the Modern World (MMW), a General Education (GE) course. Natural Language Processing with machine learning algorithms will be used to classify the sentiments of the students. Results of the research study are the themes identified through topic modelling and the overall sentiments of the students in USC SASKeywords: natural language processing, online learning, sentiment analysis, topic modelling
Procedia PDF Downloads 24610753 A Non-Destructive Estimation Method for Internal Time in Perilla Leaf Using Hyperspectral Data
Authors: Shogo Nagano, Yusuke Tanigaki, Hirokazu Fukuda
Abstract:
Vegetables harvested early in the morning or late in the afternoon are valued in plant production, and so the time of harvest is important. The biological functions known as circadian clocks have a significant effect on this harvest timing. The purpose of this study was to non-destructively estimate the circadian clock and so construct a method for determining a suitable harvest time. We took eight samples of green busil (Perilla frutescens var. crispa) every 4 hours, six times for 1 day and analyzed all samples at the same time. A hyperspectral camera was used to collect spectrum intensities at 141 different wavelengths (350–1050 nm). Calculation of correlations between spectrum intensity of each wavelength and harvest time suggested the suitability of the hyperspectral camera for non-destructive estimation. However, even the highest correlated wavelength had a weak correlation, so we used machine learning to raise the accuracy of estimation and constructed a machine learning model to estimate the internal time of the circadian clock. Artificial neural networks (ANN) were used for machine learning because this is an effective analysis method for large amounts of data. Using the estimation model resulted in an error between estimated and real times of 3 min. The estimations were made in less than 2 hours. Thus, we successfully demonstrated this method of non-destructively estimating internal time.Keywords: artificial neural network (ANN), circadian clock, green busil, hyperspectral camera, non-destructive evaluation
Procedia PDF Downloads 29910752 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel
Abstract:
Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis
Procedia PDF Downloads 71310751 Leveraging SHAP Values for Effective Feature Selection in Peptide Identification
Authors: Sharon Li, Zhonghang Xia
Abstract:
Post-database search is an essential phase in peptide identification using tandem mass spectrometry (MS/MS) to refine peptide-spectrum matches (PSMs) produced by database search engines. These engines frequently face difficulty differentiating between correct and incorrect peptide assignments. Despite advances in statistical and machine learning methods aimed at improving the accuracy of peptide identification, challenges remain in selecting critical features for these models. In this study, two machine learning models—a random forest tree and a support vector machine—were applied to three datasets to enhance PSMs. SHAP values were utilized to determine the significance of each feature within the models. The experimental results indicate that the random forest model consistently outperformed the SVM across all datasets. Further analysis of SHAP values revealed that the importance of features varies depending on the dataset, indicating that a feature's role in model predictions can differ significantly. This variability in feature selection can lead to substantial differences in model performance, with false discovery rate (FDR) differences exceeding 50% between different feature combinations. Through SHAP value analysis, the most effective feature combinations were identified, significantly enhancing model performance.Keywords: peptide identification, SHAP value, feature selection, random forest tree, support vector machine
Procedia PDF Downloads 2310750 Vibration-Based Data-Driven Model for Road Health Monitoring
Authors: Guru Prakash, Revanth Dugalam
Abstract:
A road’s condition often deteriorates due to harsh loading such as overload due to trucks, and severe environmental conditions such as heavy rain, snow load, and cyclic loading. In absence of proper maintenance planning, this results in potholes, wide cracks, bumps, and increased roughness of roads. In this paper, a data-driven model will be developed to detect these damages using vibration and image signals. The key idea of the proposed methodology is that the road anomaly manifests in these signals, which can be detected by training a machine learning algorithm. The use of various machine learning techniques such as the support vector machine and Radom Forest method will be investigated. The proposed model will first be trained and tested with artificially simulated data, and the model architecture will be finalized by comparing the accuracies of various models. Once a model is fixed, the field study will be performed, and data will be collected. The field data will be used to validate the proposed model and to predict the future road’s health condition. The proposed will help to automate the road condition monitoring process, repair cost estimation, and maintenance planning process.Keywords: SVM, data-driven, road health monitoring, pot-hole
Procedia PDF Downloads 8610749 Assessment of DNA Sequence Encoding Techniques for Machine Learning Algorithms Using a Universal Bacterial Marker
Authors: Diego Santibañez Oyarce, Fernanda Bravo Cornejo, Camilo Cerda Sarabia, Belén Díaz Díaz, Esteban Gómez Terán, Hugo Osses Prado, Raúl Caulier-Cisterna, Jorge Vergara-Quezada, Ana Moya-Beltrán
Abstract:
The advent of high-throughput sequencing technologies has revolutionized genomics, generating vast amounts of genetic data that challenge traditional bioinformatics methods. Machine learning addresses these challenges by leveraging computational power to identify patterns and extract information from large datasets. However, biological sequence data, being symbolic and non-numeric, must be converted into numerical formats for machine learning algorithms to process effectively. So far, some encoding methods, such as one-hot encoding or k-mers, have been explored. This work proposes additional approaches for encoding DNA sequences in order to compare them with existing techniques and determine if they can provide improvements or if current methods offer superior results. Data from the 16S rRNA gene, a universal marker, was used to analyze eight bacterial groups that are significant in the pulmonary environment and have clinical implications. The bacterial genes included in this analysis are Prevotella, Abiotrophia, Acidovorax, Streptococcus, Neisseria, Veillonella, Mycobacterium, and Megasphaera. These data were downloaded from the NCBI database in Genbank file format, followed by a syntactic analysis to selectively extract relevant information from each file. For data encoding, a sequence normalization process was carried out as the first step. From approximately 22,000 initial data points, a subset was generated for testing purposes. Specifically, 55 sequences from each bacterial group met the length criteria, resulting in an initial sample of approximately 440 sequences. The sequences were encoded using different methods, including one-hot encoding, k-mers, Fourier transform, and Wavelet transform. Various machine learning algorithms, such as support vector machines, random forests, and neural networks, were trained to evaluate these encoding methods. The performance of these models was assessed using multiple metrics, including the confusion matrix, ROC curve, and F1 Score, providing a comprehensive evaluation of their classification capabilities. The results show that accuracies between encoding methods vary by up to approximately 15%, with the Fourier transform obtaining the best results for the evaluated machine learning algorithms. These findings, supported by the detailed analysis using the confusion matrix, ROC curve, and F1 Score, provide valuable insights into the effectiveness of different encoding methods and machine learning algorithms for genomic data analysis, potentially improving the accuracy and efficiency of bacterial classification and related genomic studies.Keywords: DNA encoding, machine learning, Fourier transform, Fourier transformation
Procedia PDF Downloads 2310748 Machine Learning for Exoplanetary Habitability Assessment
Authors: King Kumire, Amos Kubeka
Abstract:
The synergy of machine learning and astronomical technology advancement is giving rise to the new space age, which is pronounced by better habitability assessments. To initiate this discussion, it should be recorded for definition purposes that the symbiotic relationship between astronomy and improved computing has been code-named the Cis-Astro gateway concept. The cosmological fate of this phrase has been unashamedly plagiarized from the cis-lunar gateway template and its associated LaGrange points which act as an orbital bridge to the moon from our planet Earth. However, for this study, the scientific audience is invited to bridge toward the discovery of new habitable planets. It is imperative to state that cosmic probes of this magnitude can be utilized as the starting nodes of the astrobiological search for galactic life. This research can also assist by acting as the navigation system for future space telescope launches through the delimitation of target exoplanets. The findings and the associated platforms can be harnessed as building blocks for the modeling of climate change on planet earth. The notion that if the human genus exhausts the resources of the planet earth or there is a bug of some sort that makes the earth inhabitable for humans explains the need to find an alternative planet to inhabit. The scientific community, through interdisciplinary discussions of the International Astronautical Federation so far has the common position that engineers can reduce space mission costs by constructing a stable cis-lunar orbit infrastructure for refilling and carrying out other associated in-orbit servicing activities. Similarly, the Cis-Astro gateway can be envisaged as a budget optimization technique that models extra-solar bodies and can facilitate the scoping of future mission rendezvous. It should be registered as well that this broad and voluminous catalog of exoplanets shall be narrowed along the way using machine learning filters. The gist of this topic revolves around the indirect economic rationale of establishing a habitability scoping platform.Keywords: machine-learning, habitability, exoplanets, supercomputing
Procedia PDF Downloads 9010747 Machine Learning for Exoplanetary Habitability Assessment
Authors: King Kumire, Amos Kubeka
Abstract:
The synergy of machine learning and astronomical technology advancement is giving rise to the new space age, which is pronounced by better habitability assessments. To initiate this discussion, it should be recorded for definition purposes that the symbiotic relationship between astronomy and improved computing has been code-named the Cis-Astro gateway concept. The cosmological fate of this phrase has been unashamedly plagiarized from the cis-lunar gateway template and its associated LaGrange points which act as an orbital bridge to the moon from our planet Earth. However, for this study, the scientific audience is invited to bridge toward the discovery of new habitable planets. It is imperative to state that cosmic probes of this magnitude can be utilized as the starting nodes of the astrobiological search for galactic life. This research can also assist by acting as the navigation system for future space telescope launches through the delimitation of target exoplanets. The findings and the associated platforms can be harnessed as building blocks for the modeling of climate change on planet earth. The notion that if the human genus exhausts the resources of the planet earth or there is a bug of some sort that makes the earth inhabitable for humans explains the need to find an alternative planet to inhabit. The scientific community, through interdisciplinary discussions of the International Astronautical Federation so far, has the common position that engineers can reduce space mission costs by constructing a stable cis-lunar orbit infrastructure for refilling and carrying out other associated in-orbit servicing activities. Similarly, the Cis-Astro gateway can be envisaged as a budget optimization technique that models extra-solar bodies and can facilitate the scoping of future mission rendezvous. It should be registered as well that this broad and voluminous catalog of exoplanets shall be narrowed along the way using machine learning filters. The gist of this topic revolves around the indirect economic rationale of establishing a habitability scoping platform.Keywords: exoplanets, habitability, machine-learning, supercomputing
Procedia PDF Downloads 11810746 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform
Authors: Reza Mohammadzadeh
Abstract:
The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.Keywords: data model, geotechnical risks, machine learning, underground coal mining
Procedia PDF Downloads 27410745 Investigating the Dynamics of Knowledge Acquisition in Learning Using Differential Equations
Authors: Gilbert Makanda, Roelf Sypkens
Abstract:
A mathematical model for knowledge acquisition in teaching and learning is proposed. In this study we adopt the mathematical model that is normally used for disease modelling into teaching and learning. We derive mathematical conditions which facilitate knowledge acquisition. This study compares the effects of dropping out of the course at early stages with later stages of learning. The study also investigates effect of individual interaction and learning from other sources to facilitate learning. The study fits actual data to a general mathematical model using Matlab ODE45 and lsqnonlin to obtain a unique mathematical model that can be used to predict knowledge acquisition. The data used in this study was obtained from the tutorial test results for mathematics 2 students from the Central University of Technology, Free State, South Africa in the department of Mathematical and Physical Sciences. The study confirms already known results that increasing dropout rates and forgetting taught concepts reduce the population of knowledgeable students. Increasing teaching contacts and access to other learning materials facilitate knowledge acquisition. The effect of increasing dropout rates is more enhanced in the later stages of learning than earlier stages. The study opens up a new direction in further investigations in teaching and learning using differential equations.Keywords: differential equations, knowledge acquisition, least squares nonlinear, dynamical systems
Procedia PDF Downloads 36410744 Comparison of Visual Field Tests in Glaucoma Patients with a Central Visual Field Defect
Authors: Hye-Young Shin, Hae-Young Lopilly Park, Chan Kee Park
Abstract:
We compared the 24-2 and 10-2 visual fields (VFs) and investigate the degree of discrepancy between the two tests in glaucomatous eyes with central VF defects. In all, 99 eyes of 99 glaucoma patients who underwent both the 24-2 VF and 10-2 VF tests within 6 months were enrolled retrospectively. Glaucomatous eyes involving a central VF defect were divided into three groups based on the average total deviation (TD) of 12 central points in the 24-2 VF test (N = 33, in each group): group 1 (tercile with the highest TD), group 2 (intermediate TD), and group 3 (lowest TD). The TD difference was calculated by subtracting the average TD of the 10-2 VF test from the average TD of 12 central points in the 24-2 VF test. The absolute central TD difference in each quadrant was defined as the absolute value of the TD value obtained by subtracting the average TD of four central points in the 10-2 VF test from the innermost TD in the 24-2 VF test in each quadrant. The TD differences differed significantly between group 3 and groups 1 and 2 (P < 0.001). In the superonasal quadrant, the absolute central TD difference was significantly greater in group 2 than in group 1 (P < 0.05). In the superotemporal quadrant, the absolute central TD difference was significantly greater in group 3 than in groups 1 and 2 (P < 0.001). Our results indicate that the results of VF tests for different VFs can be inconsistent, depending on the degree of central defects and the VF quadrant.Keywords: central visual field defect, glaucoma, 10-2 visual field, 24-2 visual field
Procedia PDF Downloads 17610743 Artificial Intelligence-Based Thermal Management of Battery System for Electric Vehicles
Authors: Raghunandan Gurumurthy, Aricson Pereira, Sandeep Patil
Abstract:
The escalating adoption of electric vehicles (EVs) across the globe has underscored the critical importance of advancing battery system technologies. This has catalyzed a shift towards the design and development of battery systems that not only exhibit higher energy efficiency but also boast enhanced thermal performance and sophisticated multi-material enclosures. A significant leap in this domain has been the incorporation of simulation-based design optimization for battery packs and Battery Management Systems (BMS), a move further enriched by integrating artificial intelligence/machine learning (AI/ML) approaches. These strategies are pivotal in refining the design, manufacturing, and operational processes for electric vehicles and energy storage systems. By leveraging AI/ML, stakeholders can now predict battery performance metrics—such as State of Health, State of Charge, and State of Power—with unprecedented accuracy. Furthermore, as Li-ion batteries (LIBs) become more prevalent in urban settings, the imperative for bolstering thermal and fire resilience has intensified. This has propelled Battery Thermal Management Systems (BTMs) to the forefront of energy storage research, highlighting the role of machine learning and AI not just as tools for enhanced safety management through accurate temperature forecasts and diagnostics but also as indispensable allies in the early detection and warning of potential battery fires.Keywords: electric vehicles, battery thermal management, industrial engineering, machine learning, artificial intelligence, manufacturing
Procedia PDF Downloads 9710742 Infrared Spectroscopy in Tandem with Machine Learning for Simultaneous Rapid Identification of Bacteria Isolated Directly from Patients' Urine Samples and Determination of Their Susceptibility to Antibiotics
Authors: Mahmoud Huleihel, George Abu-Aqil, Manal Suleiman, Klaris Riesenberg, Itshak Lapidot, Ahmad Salman
Abstract:
Urinary tract infections (UTIs) are considered to be the most common bacterial infections worldwide, which are caused mainly by Escherichia (E.) coli (about 80%). Klebsiella pneumoniae (about 10%) and Pseudomonas aeruginosa (about 6%). Although antibiotics are considered as the most effective treatment for bacterial infectious diseases, unfortunately, most of the bacteria already have developed resistance to the majority of the commonly available antibiotics. Therefore, it is crucial to identify the infecting bacteria and to determine its susceptibility to antibiotics for prescribing effective treatment. Classical methods are time consuming, require ~48 hours for determining bacterial susceptibility. Thus, it is highly urgent to develop a new method that can significantly reduce the time required for determining both infecting bacterium at the species level and diagnose its susceptibility to antibiotics. Fourier-Transform Infrared (FTIR) spectroscopy is well known as a sensitive and rapid method, which can detect minor molecular changes in bacterial genome associated with the development of resistance to antibiotics. The main goal of this study is to examine the potential of FTIR spectroscopy, in tandem with machine learning algorithms, to identify the infected bacteria at the species level and to determine E. coli susceptibility to different antibiotics directly from patients' urine in about 30minutes. For this goal, 1600 different E. coli isolates were isolated for different patients' urine sample, measured by FTIR, and analyzed using different machine learning algorithm like Random Forest, XGBoost, and CNN. We achieved 98% success in isolate level identification and 89% accuracy in susceptibility determination.Keywords: urinary tract infections (UTIs), E. coli, Klebsiella pneumonia, Pseudomonas aeruginosa, bacterial, susceptibility to antibiotics, infrared microscopy, machine learning
Procedia PDF Downloads 17010741 A Generalized Framework for Adaptive Machine Learning Deployments in Algorithmic Trading
Authors: Robert Caulk
Abstract:
A generalized framework for adaptive machine learning deployments in algorithmic trading is introduced, tested, and released as open-source code. The presented software aims to test the hypothesis that recent data contains enough information to form a probabilistically favorable short-term price prediction. Further, the framework contains various adaptive machine learning techniques that are geared toward generating profit during strong trends and minimizing losses during trend changes. Results demonstrate that this adaptive machine learning approach is capable of capturing trends and generating profit. The presentation also discusses the importance of defining the parameter space associated with the dynamic training data-set and using the parameter space to identify and remove outliers from prediction data points. Meanwhile, the generalized architecture enables common users to exploit the powerful machinery while focusing on high-level feature engineering and model testing. The presentation also highlights common strengths and weaknesses associated with the presented technique and presents a broad range of well-tested starting points for feature set construction, target setting, and statistical methods for enforcing risk management and maintaining probabilistically favorable entry and exit points. The presentation also describes the end-to-end data processing tools associated with FreqAI, including automatic data fetching, data aggregation, feature engineering, safe and robust data pre-processing, outlier detection, custom machine learning and statistical tools, data post-processing, and adaptive training backtest emulation, and deployment of adaptive training in live environments. Finally, the generalized user interface is also discussed in the presentation. Feature engineering is simplified so that users can seed their feature sets with common indicator libraries (e.g. TA-lib, pandas-ta). The user also feeds data expansion parameters to fill out a large feature set for the model, which can contain as many as 10,000+ features. The presentation describes the various object-oriented programming techniques employed to make FreqAI agnostic to third-party libraries and external data sources. In other words, the back-end is constructed in such a way that users can leverage a broad range of common regression libraries (Catboost, LightGBM, Sklearn, etc) as well as common Neural Network libraries (TensorFlow, PyTorch) without worrying about the logistical complexities associated with data handling and API interactions. The presentation finishes by drawing conclusions about the most important parameters associated with a live deployment of the adaptive learning framework and provides the road map for future development in FreqAI.Keywords: machine learning, market trend detection, open-source, adaptive learning, parameter space exploration
Procedia PDF Downloads 8910740 A Review on Intelligent Systems for Geoscience
Authors: R Palson Kennedy, P.Kiran Sai
Abstract:
This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science
Procedia PDF Downloads 13510739 Investigating the performance of machine learning models on PM2.5 forecasts: A case study in the city of Thessaloniki
Authors: Alexandros Pournaras, Anastasia Papadopoulou, Serafim Kontos, Anastasios Karakostas
Abstract:
The air quality of modern cities is an important concern, as poor air quality contributes to human health and environmental issues. Reliable air quality forecasting has, thus, gained scientific and governmental attention as an essential tool that enables authorities to take proactive measures for public safety. In this study, the potential of Machine Learning (ML) models to forecast PM2.5 at local scale is investigated in the city of Thessaloniki, the second largest city in Greece, which has been struggling with the persistent issue of air pollution. ML models, with proven ability to address timeseries forecasting, are employed to predict the PM2.5 concentrations and the respective Air Quality Index 5-days ahead by learning from daily historical air quality and meteorological data from 2014 to 2016 and gathered from two stations with different land use characteristics in the urban fabric of Thessaloniki. The performance of the ML models on PM2.5 concentrations is evaluated with common statistical methods, such as R squared (r²) and Root Mean Squared Error (RMSE), utilizing a portion of the stations’ measurements as test set. A multi-categorical evaluation is utilized for the assessment of their performance on respective AQIs. Several conclusions were made from the experiments conducted. Experimenting on MLs’ configuration revealed a moderate effect of various parameters and training schemas on the model’s predictions. Their performance of all these models were found to produce satisfactory results on PM2.5 concentrations. In addition, their application on untrained stations showed that these models can perform well, indicating a generalized behavior. Moreover, their performance on AQI was even better, showing that the MLs can be used as predictors for AQI, which is the direct information provided to the general public.Keywords: Air Quality, AQ Forecasting, AQI, Machine Learning, PM2.5
Procedia PDF Downloads 7710738 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Authors: Yaojun Wang, Yaoqing Wang
Abstract:
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.Keywords: case-based reasoning, decision tree, stock selection, machine learning
Procedia PDF Downloads 42010737 A Comparative Time-Series Analysis and Deep Learning Projection of Innate Radon Gas Risk in Canadian and Swedish Residential Buildings
Authors: Selim M. Khan, Dustin D. Pearson, Tryggve Rönnqvist, Markus E. Nielsen, Joshua M. Taron, Aaron A. Goodarzi
Abstract:
Accumulation of radioactive radon gas in indoor air poses a serious risk to human health by increasing the lifetime risk of lung cancer and is classified by IARC as a category one carcinogen. Radon exposure risks are a function of geologic, geographic, design, and human behavioural variables and can change over time. Using time series and deep machine learning modelling, we analyzed long-term radon test outcomes as a function of building metrics from 25,489 Canadian and 38,596 Swedish residential properties constructed between 1945 to 2020. While Canadian and Swedish properties built between 1970 and 1980 are comparable (96–103 Bq/m³), innate radon risks subsequently diverge, rising in Canada and falling in Sweden such that 21st Century Canadian houses show 467% greater average radon (131 Bq/m³) relative to Swedish equivalents (28 Bq/m³). These trends are consistent across housing types and regions within each country. The introduction of energy efficiency measures within Canadian and Swedish building codes coincided with opposing radon level trajectories in each nation. Deep machine learning modelling predicts that, without intervention, average Canadian residential radon levels will increase to 176 Bq/m³ by 2050, emphasizing the importance and urgency of future building code intervention to achieve systemic radon reduction in Canada.Keywords: radon health risk, time-series, deep machine learning, lung cancer, Canada, Sweden
Procedia PDF Downloads 8510736 Comparison of Machine Learning and Deep Learning Algorithms for Automatic Classification of 80 Different Pollen Species
Authors: Endrick Barnacin, Jean-Luc Henry, Jimmy Nagau, Jack Molinie
Abstract:
Palynology is a field of interest in many disciplines due to its multiple applications: chronological dating, climatology, allergy treatment, and honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time consuming task that requires the intervention of experts in the field, which are becoming increasingly rare due to economic and social conditions. That is why the need for automation of this task is urgent. A lot of studies have investigated the subject using different standard image processing descriptors and sometimes hand-crafted ones.In this work, we make a comparative study between classical feature extraction methods (Shape, GLCM, LBP, and others) and Deep Learning (CNN, Autoencoders, Transfer Learning) to perform a recognition task over 80 regional pollen species. It has been found that the use of Transfer Learning seems to be more precise than the other approachesKeywords: pollens identification, features extraction, pollens classification, automated palynology
Procedia PDF Downloads 13610735 Predicting Response to Cognitive Behavioral Therapy for Psychosis Using Machine Learning and Functional Magnetic Resonance Imaging
Authors: Eva Tolmeijer, Emmanuelle Peters, Veena Kumari, Liam Mason
Abstract:
Cognitive behavioral therapy for psychosis (CBTp) is effective in many but not all patients, making it important to better understand the factors that determine treatment outcomes. To date, no studies have examined whether neuroimaging can make clinically useful predictions about who will respond to CBTp. To this end, we used machine learning methods that make predictions about symptom improvement at the individual patient level. Prior to receiving CBTp, 22 patients with a diagnosis of schizophrenia completed a social-affective processing task during functional MRI. Multivariate pattern analysis assessed whether treatment response could be predicted by brain activation responses to facial affect that was either socially threatening or prosocial. The resulting models did significantly predict symptom improvement, with distinct multivariate signatures predicting psychotic (r=0.54, p=0.01) and affective (r=0.32, p=0.05) symptoms. Psychotic symptom improvement was accurately predicted from relatively focal threat-related activation across hippocampal, occipital, and temporal regions; affective symptom improvement was predicted by a more dispersed profile of responses to prosocial affect. These findings enrich our understanding of the neurobiological underpinning of treatment response. This study provides a foundation that will hopefully lead to greater precision and tailoring of the interventions offered to patients.Keywords: cognitive behavioral therapy, machine learning, psychosis, schizophrenia
Procedia PDF Downloads 27410734 Deep-Learning Based Approach to Facial Emotion Recognition through Convolutional Neural Network
Authors: Nouha Khediri, Mohammed Ben Ammar, Monji Kherallah
Abstract:
Recently, facial emotion recognition (FER) has become increasingly essential to understand the state of the human mind. Accurately classifying emotion from the face is a challenging task. In this paper, we present a facial emotion recognition approach named CV-FER, benefiting from deep learning, especially CNN and VGG16. First, the data is pre-processed with data cleaning and data rotation. Then, we augment the data and proceed to our FER model, which contains five convolutions layers and five pooling layers. Finally, a softmax classifier is used in the output layer to recognize emotions. Based on the above contents, this paper reviews the works of facial emotion recognition based on deep learning. Experiments show that our model outperforms the other methods using the same FER2013 database and yields a recognition rate of 92%. We also put forward some suggestions for future work.Keywords: CNN, deep-learning, facial emotion recognition, machine learning
Procedia PDF Downloads 9510733 Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
Authors: Bruno Trstenjak, Dzenana Donko
Abstract:
Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.Keywords: case based reasoning, classification, expert's knowledge, hybrid model
Procedia PDF Downloads 36710732 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms
Authors: Neha Ahirwar
Abstract:
In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree
Procedia PDF Downloads 6710731 Study on the Central Differencing Scheme with the Staggered Version (STG) for Solving the Hyperbolic Partial Differential Equations
Authors: Narumol Chintaganun
Abstract:
In this paper we present the second-order central differencing scheme with the staggered version (STG) for solving the advection equation and Burger's equation. This scheme based on staggered evolution of the re-constructed cell averages. This scheme results in the second-order central differencing scheme, an extension along the lines of the first-order central scheme of Lax-Friedrichs (LxF) scheme. All numerical simulations presented in this paper are obtained by finite difference method (FDM) and STG. Numerical results are shown that the STG gives very good results and higher accuracy.Keywords: central differencing scheme, STG, advection equation, burgers equation
Procedia PDF Downloads 55710730 A Support Vector Machine Learning Prediction Model of Evapotranspiration Using Real-Time Sensor Node Data
Authors: Waqas Ahmed Khan Afridi, Subhas Chandra Mukhopadhyay, Bandita Mainali
Abstract:
The research paper presents a unique approach to evapotranspiration (ET) prediction using a Support Vector Machine (SVM) learning algorithm. The study leverages real-time sensor node data to develop an accurate and adaptable prediction model, addressing the inherent challenges of traditional ET estimation methods. The integration of the SVM algorithm with real-time sensor node data offers great potential to improve spatial and temporal resolution in ET predictions. In the model development, key input features are measured and computed using mathematical equations such as Penman-Monteith (FAO56) and soil water balance (SWB), which include soil-environmental parameters such as; solar radiation (Rs), air temperature (T), atmospheric pressure (P), relative humidity (RH), wind speed (u2), rain (R), deep percolation (DP), soil temperature (ST), and change in soil moisture (∆SM). The one-year field data are split into combinations of three proportions i.e. train, test, and validation sets. While kernel functions with tuning hyperparameters have been used to train and improve the accuracy of the prediction model with multiple iterations. This paper also outlines the existing methods and the machine learning techniques to determine Evapotranspiration, data collection and preprocessing, model construction, and evaluation metrics, highlighting the significance of SVM in advancing the field of ET prediction. The results demonstrate the robustness and high predictability of the developed model on the basis of performance evaluation metrics (R2, RMSE, MAE). The effectiveness of the proposed model in capturing complex relationships within soil and environmental parameters provide insights into its potential applications for water resource management and hydrological ecosystem.Keywords: evapotranspiration, FAO56, KNIME, machine learning, RStudio, SVM, sensors
Procedia PDF Downloads 6910729 A Review of Food Security Policy Research in Central Asia
Authors: Mergen Dyussenov
Abstract:
Food security has become a prominent issue on the global policy agenda. Yet, one particular region that remains understudied is a cohort of Central Asian countries. To shed light onto the issue, the paper looks into a review of existing literature related to food security policies in Central Asia. In so doing, it seeks to systematize the context analyzed, key findings, and recommendations. Furthermore, it analyzes the role of key actors in promoting the food security policies across Central Asian nations. Finally, the paper attempts to set the agenda for further research.Keywords: food security, central Asia, the role of actors, policy analysis
Procedia PDF Downloads 32610728 DNA Methylation Score Development for In utero Exposure to Paternal Smoking Using a Supervised Machine Learning Approach
Authors: Cristy Stagnar, Nina Hubig, Diana Ivankovic
Abstract:
The epigenome is a compelling candidate for mediating long-term responses to environmental effects modifying disease risk. The main goal of this research is to develop a machine learning-based DNA methylation score, which will be valuable in delineating the unique contribution of paternal epigenetic modifications to the germline impacting childhood health outcomes. It will also be a useful tool in validating self-reports of nonsmoking and in adjusting epigenome-wide DNA methylation association studies for this early-life exposure. Using secondary data from two population-based methylation profiling studies, our DNA methylation score is based on CpG DNA methylation measurements from cord blood gathered from children whose fathers smoked pre- and peri-conceptually. Each child’s mother and father fell into one of three class labels in the accompanying questionnaires -never smoker, former smoker, or current smoker. By applying different machine learning algorithms to the accessible resource for integrated epigenomic studies (ARIES) sub-study of the Avon longitudinal study of parents and children (ALSPAC) data set, which we used for training and testing of our model, the best-performing algorithm for classifying the father smoker and mother never smoker was selected based on Cohen’s κ. Error in the model was identified and optimized. The final DNA methylation score was further tested and validated in an independent data set. This resulted in a linear combination of methylation values of selected probes via a logistic link function that accurately classified each group and contributed the most towards classification. The result is a unique, robust DNA methylation score which combines information on DNA methylation and early life exposure of offspring to paternal smoking during pregnancy and which may be used to examine the paternal contribution to offspring health outcomes.Keywords: epigenome, health outcomes, paternal preconception environmental exposures, supervised machine learning
Procedia PDF Downloads 18510727 Development and Application of the Proctoring System with Face Recognition for User Registration on the Educational Information Portal
Authors: Meruyert Serik, Nassipzhan Duisegaliyeva, Danara Tleumagambetova, Madina Ermaganbetova
Abstract:
This research paper explores the process of creating a proctoring system by evaluating the implementation of practical face recognition algorithms. Students of educational programs reviewed the research work "6B01511-Computer Science", "7M01511-Computer Science", "7M01525- STEM Education," and "8D01511-Computer Science" of Eurasian National University named after L.N. Gumilyov. As an outcome, a proctoring system will be created, enabling the conduction of tests and ensuring academic integrity checks within the system. Due to the correct operation of the system, test works are carried out. The result of the creation of the proctoring system will be the basis for the automation of the informational, educational portal developed by machine learning.Keywords: artificial intelligence, education portal, face recognition, machine learning, proctoring
Procedia PDF Downloads 12610726 OSEME: A Smart Learning Environment for Music Education
Authors: Konstantinos Sofianos, Michael Stefanidakis
Abstract:
Nowadays, advances in information and communication technologies offer a range of opportunities for new approaches, methods, and tools in the field of education and training. Teacher-centered learning has changed to student-centered learning. E-learning has now matured and enables the design and construction of intelligent learning systems. A smart learning system fully adapts to a student's needs and provides them with an education based on their preferences, learning styles, and learning backgrounds. It is a wise friend and available at any time, in any place, and with any digital device. In this paper, we propose an intelligent learning system, which includes an ontology with all elements of the learning process (learning objects, learning activities) and a massive open online course (MOOC) system. This intelligent learning system can be used in music education.Keywords: intelligent learning systems, e-learning, music education, ontology, semantic web
Procedia PDF Downloads 312