Search results for: statistical machine learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 12038

Search results for: statistical machine learning

11618 Breast Cancer Detection Using Machine Learning Algorithms

Authors: Jiwan Kumar, Pooja, Sandeep Negi, Anjum Rouf, Amit Kumar, Naveen Lakra

Abstract:

In modern times where, health issues are increasing day by day, breast cancer is also one of them, which is very crucial and really important to find in the early stages. Doctors can use this model in order to tell their patients whether a cancer is not harmful (benign) or harmful (malignant). We have used the knowledge of machine learning in order to produce the model. we have used algorithms like Logistic Regression, Random forest, support Vector Classifier, Bayesian Network and Radial Basis Function. We tried to use the data of crucial parts and show them the results in pictures in order to make it easier for doctors. By doing this, we're making ML better at finding breast cancer, which can lead to saving more lives and better health care.

Keywords: Bayesian network, radial basis function, ensemble learning, understandable, data making better, random forest, logistic regression, breast cancer

Procedia PDF Downloads 52
11617 Analysis of Real Time Seismic Signal Dataset Using Machine Learning

Authors: Sujata Kulkarni, Udhav Bhosle, Vijaykumar T.

Abstract:

Due to the closeness between seismic signals and non-seismic signals, it is vital to detect earthquakes using conventional methods. In order to distinguish between seismic events and non-seismic events depending on their amplitude, our study processes the data that come from seismic sensors. The authors suggest a robust noise suppression technique that makes use of a bandpass filter, an IIR Wiener filter, recursive short-term average/long-term average (STA/LTA), and Carl short-term average (STA)/long-term average for event identification (LTA). The trigger ratio used in the proposed study to differentiate between seismic and non-seismic activity is determined. The proposed work focuses on significant feature extraction for machine learning-based seismic event detection. This serves as motivation for compiling a dataset of all features for the identification and forecasting of seismic signals. We place a focus on feature vector dimension reduction techniques due to the temporal complexity. The proposed notable features were experimentally tested using a machine learning model, and the results on unseen data are optimal. Finally, a presentation using a hybrid dataset (captured by different sensors) demonstrates how this model may also be employed in a real-time setting while lowering false alarm rates. The planned study is based on the examination of seismic signals obtained from both individual sensors and sensor networks (SN). A wideband seismic signal from BSVK and CUKG station sensors, respectively located near Basavakalyan, Karnataka, and the Central University of Karnataka, makes up the experimental dataset.

Keywords: Carl STA/LTA, features extraction, real time, dataset, machine learning, seismic detection

Procedia PDF Downloads 124
11616 On the Influence of Sleep Habits for Predicting Preterm Births: A Machine Learning Approach

Authors: C. Fernandez-Plaza, I. Abad, E. Diaz, I. Diaz

Abstract:

Births occurring before the 37th week of gestation are considered preterm births. A threat of preterm is defined as the beginning of regular uterine contractions, dilation and cervical effacement between 23 and 36 gestation weeks. To author's best knowledge, the factors that determine the beginning of the birth are not completely defined yet. In particular, the incidence of sleep habits on preterm births is weekly studied. The aim of this study is to develop a model to predict the factors affecting premature delivery on pregnancy, based on the above potential risk factors, including those derived from sleep habits and light exposure at night (introduced as 12 variables obtained by a telephone survey using two questionnaires previously used by other authors). Thus, three groups of variables were included in the study (maternal, fetal and sleep habits). The study was approved by Research Ethics Committee of the Principado of Asturias (Spain). An observational, retrospective and descriptive study was performed with 481 births between January 1, 2015 and May 10, 2016 in the University Central Hospital of Asturias (Spain). A statistical analysis using SPSS was carried out to compare qualitative and quantitative variables between preterm and term delivery. Chi-square test qualitative variable and t-test for quantitative variables were applied. Statistically significant differences (p < 0.05) between preterm vs. term births were found for primiparity, multi-parity, kind of conception, place of residence or premature rupture of membranes and interruption during nights. In addition to the statistical analysis, machine learning methods to look for a prediction model were tested. In particular, tree based models were applied as the trade-off between performance and interpretability is especially suitable for this study. C5.0, recursive partitioning, random forest and tree bag models were analysed using caret R-package. Cross validation with 10-folds and parameter tuning to optimize the methods were applied. In addition, different noise reduction methods were applied to the initial data using NoiseFiltersR package. The best performance was obtained by C5.0 method with Accuracy 0.91, Sensitivity 0.93, Specificity 0.89 and Precision 0.91. Some well known preterm birth factors were identified: Cervix Dilation, maternal BMI, Premature rupture of membranes or nuchal translucency analysis in the first trimester. The model also identifies other new factors related to sleep habits such as light through window, bedtime on working days, usage of electronic devices before sleeping from Mondays to Fridays or change of sleeping habits reflected in the number of hours, in the depth of sleep or in the lighting of the room. IF dilation < = 2.95 AND usage of electronic devices before sleeping from Mondays to Friday = YES and change of sleeping habits = YES, then preterm is one of the predicting rules obtained by C5.0. In this work a model for predicting preterm births is developed. It is based on machine learning together with noise reduction techniques. The method maximizing the performance is the one selected. This model shows the influence of variables related to sleep habits in preterm prediction.

Keywords: machine learning, noise reduction, preterm birth, sleep habit

Procedia PDF Downloads 147
11615 Data Model to Predict Customize Skin Care Product Using Biosensor

Authors: Ashi Gautam, Isha Shukla, Akhil Seghal

Abstract:

Biosensors are analytical devices that use a biological sensing element to detect and measure a specific chemical substance or biomolecule in a sample. These devices are widely used in various fields, including medical diagnostics, environmental monitoring, and food analysis, due to their high specificity, sensitivity, and selectivity. In this research paper, a machine learning model is proposed for predicting the suitability of skin care products based on biosensor readings. The proposed model takes in features extracted from biosensor readings, such as biomarker concentration, skin hydration level, inflammation presence, sensitivity, and free radicals, and outputs the most appropriate skin care product for an individual. This model is trained on a dataset of biosensor readings and corresponding skin care product information. The model's performance is evaluated using several metrics, including accuracy, precision, recall, and F1 score. The aim of this research is to develop a personalised skin care product recommendation system using biosensor data. By leveraging the power of machine learning, the proposed model can accurately predict the most suitable skin care product for an individual based on their biosensor readings. This is particularly useful in the skin care industry, where personalised recommendations can lead to better outcomes for consumers. The developed model is based on supervised learning, which means that it is trained on a labeled dataset of biosensor readings and corresponding skin care product information. The model uses these labeled data to learn patterns and relationships between the biosensor readings and skin care products. Once trained, the model can predict the most suitable skin care product for an individual based on their biosensor readings. The results of this study show that the proposed machine learning model can accurately predict the most appropriate skin care product for an individual based on their biosensor readings. The evaluation metrics used in this study demonstrate the effectiveness of the model in predicting skin care products. This model has significant potential for practical use in the skin care industry for personalised skin care product recommendations. The proposed machine learning model for predicting the suitability of skin care products based on biosensor readings is a promising development in the skin care industry. The model's ability to accurately predict the most appropriate skin care product for an individual based on their biosensor readings can lead to better outcomes for consumers. Further research can be done to improve the model's accuracy and effectiveness.

Keywords: biosensors, data model, machine learning, skin care

Procedia PDF Downloads 97
11614 Predicting the Product Life Cycle of Songs on Radio - How Record Labels Can Manage Product Portfolio and Prioritise Artists by Using Machine Learning Techniques

Authors: Claus N. Holm, Oliver F. Grooss, Robert A. Alphinas

Abstract:

This research strives to predict the remaining product life cycle of a song on radio after it has been played for one or two months. The best results were achieved using a k-d tree to calculate the most similar songs to the test songs and use a Random Forest model to forecast radio plays. An 82.78% and 83.44% accuracy is achieved for the two time periods, respectively. This explorative research leads to over 4500 test metrics to find the best combination of models and pre-processing techniques. Other algorithms tested are KNN, MLP and CNN. The features only consist of daily radio plays and use no musical features.

Keywords: hit song science, product life cycle, machine learning, radio

Procedia PDF Downloads 155
11613 Multilayer Perceptron Neural Network for Rainfall-Water Level Modeling

Authors: Thohidul Islam, Md. Hamidul Haque, Robin Kumar Biswas

Abstract:

Floods are one of the deadliest natural disasters which are very complex to model; however, machine learning is opening the door for more reliable and accurate flood prediction. In this research, a multilayer perceptron neural network (MLP) is developed to model the rainfall-water level relation, in a subtropical monsoon climatic region of the Bangladesh-India border. Our experiments show promising empirical results to forecast the water level for 1 day lead time. Our best performing MLP model achieves 98.7% coefficient of determination with lower model complexity which surpasses previously reported results on similar forecasting problems.

Keywords: flood forecasting, machine learning, multilayer perceptron network, regression

Procedia PDF Downloads 172
11612 Automated Feature Extraction and Object-Based Detection from High-Resolution Aerial Photos Based on Machine Learning and Artificial Intelligence

Authors: Mohammed Al Sulaimani, Hamad Al Manhi

Abstract:

With the development of Remote Sensing technology, the resolution of optical Remote Sensing images has greatly improved, and images have become largely available. Numerous detectors have been developed for detecting different types of objects. In the past few years, Remote Sensing has benefited a lot from deep learning, particularly Deep Convolution Neural Networks (CNNs). Deep learning holds great promise to fulfill the challenging needs of Remote Sensing and solving various problems within different fields and applications. The use of Unmanned Aerial Systems in acquiring Aerial Photos has become highly used and preferred by most organizations to support their activities because of their high resolution and accuracy, which make the identification and detection of very small features much easier than Satellite Images. And this has opened an extreme era of Deep Learning in different applications not only in feature extraction and prediction but also in analysis. This work addresses the capacity of Machine Learning and Deep Learning in detecting and extracting Oil Leaks from Flowlines (Onshore) using High-Resolution Aerial Photos which have been acquired by UAS fixed with RGB Sensor to support early detection of these leaks and prevent the company from the leak’s losses and the most important thing environmental damage. Here, there are two different approaches and different methods of DL have been demonstrated. The first approach focuses on detecting the Oil Leaks from the RAW Aerial Photos (not processed) using a Deep Learning called Single Shoot Detector (SSD). The model draws bounding boxes around the leaks, and the results were extremely good. The second approach focuses on detecting the Oil Leaks from the Ortho-mosaiced Images (Georeferenced Images) by developing three Deep Learning Models using (MaskRCNN, U-Net and PSP-Net Classifier). Then, post-processing is performed to combine the results of these three Deep Learning Models to achieve a better detection result and improved accuracy. Although there is a relatively small amount of datasets available for training purposes, the Trained DL Models have shown good results in extracting the extent of the Oil Leaks and obtaining excellent and accurate detection.

Keywords: GIS, remote sensing, oil leak detection, machine learning, aerial photos, unmanned aerial systems

Procedia PDF Downloads 33
11611 An Intelligent Baby Care System Based on IoT and Deep Learning Techniques

Authors: Chinlun Lai, Lunjyh Jiang

Abstract:

Due to the heavy burden and pressure of caring for infants, an integrated automatic baby watching system based on IoT smart sensing and deep learning machine vision techniques is proposed in this paper. By monitoring infant body conditions such as heartbeat, breathing, body temperature, sleeping posture, as well as the surrounding conditions such as dangerous/sharp objects, light, noise, humidity and temperature, the proposed system can analyze and predict the obvious/potential dangerous conditions according to observed data and then adopt suitable actions in real time to protect the infant from harm. Thus, reducing the burden of the caregiver and improving safety efficiency of the caring work. The experimental results show that the proposed system works successfully for the infant care work and thus can be implemented in various life fields practically.

Keywords: baby care system, Internet of Things, deep learning, machine vision

Procedia PDF Downloads 224
11610 A Case Study on Machine Learning-Based Project Performance Forecasting for an Urban Road Reconstruction Project

Authors: Soheila Sadeghi

Abstract:

In construction projects, predicting project performance metrics accurately is essential for effective management and successful delivery. However, conventional methods often depend on fixed baseline plans, disregarding the evolving nature of project progress and external influences. To address this issue, we introduce a distinct approach based on machine learning to forecast key performance indicators, such as cost variance and earned value, for each Work Breakdown Structure (WBS) category within an urban road reconstruction project. Our proposed model leverages time series forecasting techniques, namely Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) networks, to predict future performance by analyzing historical data and project progress. Additionally, the model incorporates external factors, including weather patterns and resource availability, as features to improve forecast accuracy. By harnessing the predictive capabilities of machine learning, our performance forecasting model enables project managers to proactively identify potential deviations from the baseline plan and take timely corrective measures. To validate the effectiveness of the proposed approach, we conduct a case study on an urban road reconstruction project, comparing the model's predictions with actual project performance data. The outcomes of this research contribute to the advancement of project management practices in the construction industry by providing a data-driven solution for enhancing project performance monitoring and control.

Keywords: project performance forecasting, machine learning, time series forecasting, cost variance, schedule variance, earned value management

Procedia PDF Downloads 39
11609 Exploring the Influence of Wind on Wildfire Behavior in China: A Data-Driven Study Using Machine Learning and Remote Sensing

Authors: Rida Kanwal, Wang Yuhui, Song Weiguo

Abstract:

Wildfires are one of the most prominent threats to ecosystems, human health, and economic activities, with wind acting as a critical driving factor. This study combines machine learning (ML) and remote sensing (RS) to assess the effects of wind on wildfires in Chongqing Province from August 16-23, 2022. Landsat 8 satellite images were used to estimate the difference normalized burn ratio (dNBR), representing prefire and postfire vegetation conditions. Wind data was analyzed through geographic information system (GIS) mapping. Correlation analysis between wind speed and fire radiative power (FRP) revealed a significant relationship. An autoregressive integrated moving average (ARIMA) model was developed for wind forecasting, and linear regression was applied to determine the effect of wind speed on FRP. The results identified high wind speed as a key factor contributing to the surge in FRP. Wind-rose plots showed winds blowing to the northwest (NW), aligning with the wildfire spread. This model was further validated with data from other provinces across China. This study integrated ML, RS, and GIS to analyze wildfire behavior, providing effective strategies for prediction and management.

Keywords: wildfires, machine learning, remote sensing, wind speed, GIS, wildfire behavior

Procedia PDF Downloads 20
11608 One-Step Time Series Predictions with Recurrent Neural Networks

Authors: Vaidehi Iyer, Konstantin Borozdin

Abstract:

Time series prediction problems have many important practical applications, but are notoriously difficult for statistical modeling. Recently, machine learning methods have been attracted significant interest as a practical tool applied to a variety of problems, even though developments in this field tend to be semi-empirical. This paper explores application of Long Short Term Memory based Recurrent Neural Networks to the one-step prediction of time series for both trend and stochastic components. Two types of data are analyzed - daily stock prices, that are often considered to be a typical example of a random walk, - and weather patterns dominated by seasonal variations. Results from both analyses are compared, and reinforced learning framework is used to select more efficient between Recurrent Neural Networks and more traditional auto regression methods. It is shown that both methods are able to follow long-term trends and seasonal variations closely, but have difficulties with reproducing day-to-day variability. Future research directions and potential real world applications are briefly discussed.

Keywords: long short term memory, prediction methods, recurrent neural networks, reinforcement learning

Procedia PDF Downloads 229
11607 Forward Conditional Restricted Boltzmann Machines for the Generation of Music

Authors: Johan Loeckx, Joeri Bultheel

Abstract:

Recently, the application of deep learning to music has gained popularity. Its true potential, however, has been largely unexplored. In this paper, a new idea for representing the dynamic behavior of music is proposed. A ”forward” conditional RBM takes into account not only preceding but also future samples during training. Though this may sound controversial at first sight, it will be shown that it makes sense from a musical and neuro-cognitive perspective. The model is applied to reconstruct music based upon the first notes and to improvise in the musical style of a composer. Different to expectations, reconstruction accuracy with respect to a regular CRBM with the same order, was not significantly improved. More research is needed to test the performance on unseen data.

Keywords: deep learning, restricted boltzmann machine, music generation, conditional restricted boltzmann machine (CRBM)

Procedia PDF Downloads 522
11606 Predicting Match Outcomes in Team Sport via Machine Learning: Evidence from National Basketball Association

Authors: Jacky Liu

Abstract:

This paper develops a team sports outcome prediction system with potential for wide-ranging applications across various disciplines. Despite significant advancements in predictive analytics, existing studies in sports outcome predictions possess considerable limitations, including insufficient feature engineering and underutilization of advanced machine learning techniques, among others. To address these issues, we extend the Sports Cross Industry Standard Process for Data Mining (SRP-CRISP-DM) framework and propose a unique, comprehensive predictive system, using National Basketball Association (NBA) data as an example to test this extended framework. Our approach follows a holistic methodology in feature engineering, employing both Time Series and Non-Time Series Data, as well as conducting Explanatory Data Analysis and Feature Selection. Furthermore, we contribute to the discourse on target variable choice in team sports outcome prediction, asserting that point spread prediction yields higher profits as opposed to game-winner predictions. Using machine learning algorithms, particularly XGBoost, results in a significant improvement in predictive accuracy of team sports outcomes. Applied to point spread betting strategies, it offers an astounding annual return of approximately 900% on an initial investment of $100. Our findings not only contribute to academic literature, but have critical practical implications for sports betting. Our study advances the understanding of team sports outcome prediction a burgeoning are in complex system predictions and pave the way for potential profitability and more informed decision making in sports betting markets.

Keywords: machine learning, team sports, game outcome prediction, sports betting, profits simulation

Procedia PDF Downloads 102
11605 Towards Developing a Self-Explanatory Scheduling System Based on a Hybrid Approach

Authors: Jian Zheng, Yoshiyasu Takahashi, Yuichi Kobayashi, Tatsuhiro Sato

Abstract:

In the study, we present a conceptual framework for developing a scheduling system that can generate self-explanatory and easy-understanding schedules. To this end, a user interface is conceived to help planners record factors that are considered crucial in scheduling, as well as internal and external sources relating to such factors. A hybrid approach combining machine learning and constraint programming is developed to generate schedules and the corresponding factors, and accordingly display them on the user interface. Effects of the proposed system on scheduling are discussed, and it is expected that scheduling efficiency and system understandability will be improved, compared with previous scheduling systems.

Keywords: constraint programming, factors considered in scheduling, machine learning, scheduling system

Procedia PDF Downloads 324
11604 Development of an Automatic Computational Machine Learning Pipeline to Process Confocal Fluorescence Images for Virtual Cell Generation

Authors: Miguel Contreras, David Long, Will Bachman

Abstract:

Background: Microscopy plays a central role in cell and developmental biology. In particular, fluorescence microscopy can be used to visualize specific cellular components and subsequently quantify their morphology through development of virtual-cell models for study of effects of mechanical forces on cells. However, there are challenges with these imaging experiments, which can make it difficult to quantify cell morphology: inconsistent results, time-consuming and potentially costly protocols, and limitation on number of labels due to spectral overlap. To address these challenges, the objective of this project is to develop an automatic computational machine learning pipeline to predict cellular components morphology for virtual-cell generation based on fluorescence cell membrane confocal z-stacks. Methods: Registered confocal z-stacks of nuclei and cell membrane of endothelial cells, consisting of 20 images each, were obtained from fluorescence confocal microscopy and normalized through software pipeline for each image to have a mean pixel intensity value of 0.5. An open source machine learning algorithm, originally developed to predict fluorescence labels on unlabeled transmitted light microscopy cell images, was trained using this set of normalized z-stacks on a single CPU machine. Through transfer learning, the algorithm used knowledge acquired from its previous training sessions to learn the new task. Once trained, the algorithm was used to predict morphology of nuclei using normalized cell membrane fluorescence images as input. Predictions were compared to the ground truth fluorescence nuclei images. Results: After one week of training, using one cell membrane z-stack (20 images) and corresponding nuclei label, results showed qualitatively good predictions on training set. The algorithm was able to accurately predict nuclei locations as well as shape when fed only fluorescence membrane images. Similar training sessions with improved membrane image quality, including clear lining and shape of the membrane, clearly showing the boundaries of each cell, proportionally improved nuclei predictions, reducing errors relative to ground truth. Discussion: These results show the potential of pre-trained machine learning algorithms to predict cell morphology using relatively small amounts of data and training time, eliminating the need of using multiple labels in immunofluorescence experiments. With further training, the algorithm is expected to predict different labels (e.g., focal-adhesion sites, cytoskeleton), which can be added to the automatic machine learning pipeline for direct input into Principal Component Analysis (PCA) for generation of virtual-cell mechanical models.

Keywords: cell morphology prediction, computational machine learning, fluorescence microscopy, virtual-cell models

Procedia PDF Downloads 205
11603 Comprehensive Review of Adversarial Machine Learning in PDF Malware

Authors: Preston Nabors, Nasseh Tabrizi

Abstract:

Portable Document Format (PDF) files have gained significant popularity for sharing and distributing documents due to their universal compatibility. However, the widespread use of PDF files has made them attractive targets for cybercriminals, who exploit vulnerabilities to deliver malware and compromise the security of end-user systems. This paper reviews notable contributions in PDF malware detection, including static, dynamic, signature-based, and hybrid analysis. It presents a comprehensive examination of PDF malware detection techniques, focusing on the emerging threat of adversarial sampling and the need for robust defense mechanisms. The paper highlights the vulnerability of machine learning classifiers to evasion attacks. It explores adversarial sampling techniques in PDF malware detection to produce mimicry and reverse mimicry evasion attacks, which aim to bypass detection systems. Improvements for future research are identified, including accessible methods, applying adversarial sampling techniques to malicious payloads, evaluating other models, evaluating the importance of features to malware, implementing adversarial defense techniques, and conducting comprehensive examination across various scenarios. By addressing these opportunities, researchers can enhance PDF malware detection and develop more resilient defense mechanisms against adversarial attacks.

Keywords: adversarial attacks, adversarial defense, adversarial machine learning, intrusion detection, PDF malware, malware detection, malware detection evasion

Procedia PDF Downloads 39
11602 On the Use of Machine Learning for Tamper Detection

Authors: Basel Halak, Christian Hall, Syed Abdul Father, Nelson Chow Wai Kit, Ruwaydah Widaad Raymode

Abstract:

The attack surface on computing devices is becoming very sophisticated, driven by the sheer increase of interconnected devices, reaching 50B in 2025, which makes it easier for adversaries to have direct access and perform well-known physical attacks. The impact of increased security vulnerability of electronic systems is exacerbated for devices that are part of the critical infrastructure or those used in military applications, where the likelihood of being targeted is very high. This continuously evolving landscape of security threats calls for a new generation of defense methods that are equally effective and adaptive. This paper proposes an intelligent defense mechanism to protect from physical tampering, it consists of a tamper detection system enhanced with machine learning capabilities, which allows it to recognize normal operating conditions, classify known physical attacks and identify new types of malicious behaviors. A prototype of the proposed system has been implemented, and its functionality has been successfully verified for two types of normal operating conditions and further four forms of physical attacks. In addition, a systematic threat modeling analysis and security validation was carried out, which indicated the proposed solution provides better protection against including information leakage, loss of data, and disruption of operation.

Keywords: anti-tamper, hardware, machine learning, physical security, embedded devices, ioT

Procedia PDF Downloads 153
11601 Evaluation of Machine Learning Algorithms and Ensemble Methods for Prediction of Students’ Graduation

Authors: Soha A. Bahanshal, Vaibhav Verdhan, Bayong Kim

Abstract:

Graduation rates at six-year colleges are becoming a more essential indicator for incoming fresh students and for university rankings. Predicting student graduation is extremely beneficial to schools and has a huge potential for targeted intervention. It is important for educational institutions since it enables the development of strategic plans that will assist or improve students' performance in achieving their degrees on time (GOT). A first step and a helping hand in extracting useful information from these data and gaining insights into the prediction of students' progress and performance is offered by machine learning techniques. Data analysis and visualization techniques are applied to understand and interpret the data. The data used for the analysis contains students who have graduated in 6 years in the academic year 2017-2018 for science majors. This analysis can be used to predict the graduation of students in the next academic year. Different Predictive modelings such as logistic regression, decision trees, support vector machines, Random Forest, Naïve Bayes, and KNeighborsClassifier are applied to predict whether a student will graduate. These classifiers were evaluated with k folds of 5. The performance of these classifiers was compared based on accuracy measurement. The results indicated that Ensemble Classifier achieves better accuracy, about 91.12%. This GOT prediction model would hopefully be useful to university administration and academics in developing measures for assisting and boosting students' academic performance and ensuring they graduate on time.

Keywords: prediction, decision trees, machine learning, support vector machine, ensemble model, student graduation, GOT graduate on time

Procedia PDF Downloads 72
11600 Hate Speech Detection Using Machine Learning: A Survey

Authors: Edemealem Desalegn Kingawa, Kafte Tasew Timkete, Mekashaw Girmaw Abebe, Terefe Feyisa, Abiyot Bitew Mihretie, Senait Teklemarkos Haile

Abstract:

Currently, hate speech is a growing challenge for society, individuals, policymakers, and researchers, as social media platforms make it easy to anonymously create and grow online friends and followers and provide an online forum for debate about specific issues of community life, culture, politics, and others. Despite this, research on identifying and detecting hate speech is not satisfactory performance, and this is why future research on this issue is constantly called for. This paper provides a systematic review of the literature in this field, with a focus on approaches like word embedding techniques, machine learning, deep learning technologies, hate speech terminology, and other state-of-the-art technologies with challenges. In this paper, we have made a systematic review of the last six years of literature from Research Gate and Google Scholar. Furthermore, limitations, along with algorithm selection and use challenges, data collection, and cleaning challenges, and future research directions, are discussed in detail.

Keywords: Amharic hate speech, deep learning approach, hate speech detection review, Afaan Oromo hate speech detection

Procedia PDF Downloads 177
11599 Thick Data Analytics for Learning Cataract Severity: A Triplet Loss Siamese Neural Network Model

Authors: Jinan Fiaidhi, Sabah Mohammed

Abstract:

Diagnosing cataract severity is an important factor in deciding to undertake surgery. It is usually conducted by an ophthalmologist or through taking a variety of fundus photography that needs to be examined by the ophthalmologist. This paper carries out an investigation using a Siamese neural net that can be trained with small anchor samples to score cataract severity. The model used in this paper is based on a triplet loss function that takes the ophthalmologist best experience in rating positive and negative anchors to a specific cataract scaling system. This approach that takes the heuristics of the ophthalmologist is generally called the thick data approach, which is a kind of machine learning approach that learn from a few shots. Clinical Relevance: The lens of the eye is mostly made up of water and proteins. A cataract occurs when these proteins at the eye lens start to clump together and block lights causing impair vision. This research aims at employing thick data machine learning techniques to rate the severity of the cataract using Siamese neural network.

Keywords: thick data analytics, siamese neural network, triplet-loss model, few shot learning

Procedia PDF Downloads 111
11598 Role of Machine Learning in Internet of Things Enabled Smart Cities

Authors: Amit Prakash Singh, Shyamli Singh, Chavi Srivastav

Abstract:

This paper presents the idea of Internet of Thing (IoT) for the infrastructure of smart cities. Internet of Thing has been visualized as a communication prototype that incorporates myriad of digital services. The various component of the smart cities shall be implemented using microprocessor, microcontroller, sensors for network communication and protocols. IoT enabled systems have been devised to support the smart city vision, of which aim is to exploit the currently available precocious communication technologies to support the value-added services for function of the city. Due to volume, variety, and velocity of data, it requires analysis using Big Data concept. This paper presented the various techniques used to analyze big data using machine learning.

Keywords: IoT, smart city, embedded systems, sustainable environment

Procedia PDF Downloads 575
11597 Precise CNC Machine for Multi-Tasking

Authors: Haroon Jan Khan, Xian-Feng Xu, Syed Nasir Shah, Anooshay Niazi

Abstract:

CNC machines are not only used on a large scale but also now become a prominent necessity among households and smaller businesses. Printed Circuit Boards manufactured by the chemical process are not only risky and unsafe but also expensive and time-consuming. A 3-axis precise CNC machine has been developed, which not only fabricates PCB but has also been used for multi-tasks just by changing the materials used and tools, making it versatile. The advanced CNC machine takes data from CAM software. The TB-6560 controller is used in the CNC machine to adjust variation in the X, Y, and Z axes. The advanced machine is efficient in automatic drilling, engraving, and cutting.

Keywords: CNC, G-code, CAD, CAM, Proteus, FLATCAM, Easel

Procedia PDF Downloads 160
11596 Machine Learning Approach in Predicting Cracking Performance of Fiber Reinforced Asphalt Concrete Materials

Authors: Behzad Behnia, Noah LaRussa-Trott

Abstract:

In recent years, fibers have been successfully used as an additive to reinforce asphalt concrete materials and to enhance the sustainability and resiliency of transportation infrastructure. Roads covered with fiber-reinforced asphalt concrete (FRAC) require less frequent maintenance and tend to have a longer lifespan. The present work investigates the application of sasobit-coated aramid fibers in asphalt pavements and employs machine learning to develop prediction models to evaluate the cracking performance of FRAC materials. For the experimental part of the study, the effects of several important parameters such as fiber content, fiber length, and testing temperature on fracture characteristics of FRAC mixtures were thoroughly investigated. Two mechanical performance tests, i.e., the disk-shaped compact tension [DC(T)] and indirect tensile [ID(T)] strength tests, as well as the non-destructive acoustic emission test, were utilized to experimentally measure the cracking behavior of the FRAC material in both macro and micro level, respectively. The experimental results were used to train the supervised machine learning approach in order to establish prediction models for fracture performance of the FRAC mixtures in the field. Experimental results demonstrated that adding fibers improved the overall fracture performance of asphalt concrete materials by increasing their fracture energy, tensile strength and lowering their 'embrittlement temperature'. FRAC mixtures containing long-size fibers exhibited better cracking performance than regular-size fiber mixtures. The developed prediction models of this study could be easily employed by pavement engineers in the assessment of the FRAC pavements.

Keywords: fiber reinforced asphalt concrete, machine learning, cracking performance tests, prediction model

Procedia PDF Downloads 141
11595 Machine Learning Prediction of Diabetes Prevalence in the U.S. Using Demographic, Physical, and Lifestyle Indicators: A Study Based on NHANES 2009-2018

Authors: Oluwafunmibi Omotayo Fasanya, Augustine Kena Adjei

Abstract:

To develop a machine learning model to predict diabetes (DM) prevalence in the U.S. population using demographic characteristics, physical indicators, and lifestyle habits, and to analyze how these factors contribute to the likelihood of diabetes. We analyzed data from 23,546 participants aged 20 and older, who were non-pregnant, from the 2009-2018 National Health and Nutrition Examination Survey (NHANES). The dataset included key demographic (age, sex, ethnicity), physical (BMI, leg length, total cholesterol [TCHOL], fasting plasma glucose), and lifestyle indicators (smoking habits). A weighted sample was used to account for NHANES survey design features such as stratification and clustering. A classification machine learning model was trained to predict diabetes status. The target variable was binary (diabetes or non-diabetes) based on fasting plasma glucose measurements. The following models were evaluated: Logistic Regression (baseline), Random Forest Classifier, Gradient Boosting Machine (GBM), Support Vector Machine (SVM). Model performance was assessed using accuracy, F1-score, AUC-ROC, and precision-recall metrics. Feature importance was analyzed using SHAP values to interpret the contributions of variables such as age, BMI, ethnicity, and smoking status. The Gradient Boosting Machine (GBM) model outperformed other classifiers with an AUC-ROC score of 0.85. Feature importance analysis revealed the following key predictors: Age: The most significant predictor, with diabetes prevalence increasing with age, peaking around the 60s for males and 70s for females. BMI: Higher BMI was strongly associated with a higher risk of diabetes. Ethnicity: Black participants had the highest predicted prevalence of diabetes (14.6%), followed by Mexican-Americans (13.5%) and Whites (10.6%). TCHOL: Diabetics had lower total cholesterol levels, particularly among White participants (mean decline of 23.6 mg/dL). Smoking: Smoking showed a slight increase in diabetes risk among Whites (0.2%) but had a limited effect in other ethnic groups. Using machine learning models, we identified key demographic, physical, and lifestyle predictors of diabetes in the U.S. population. The results confirm that diabetes prevalence varies significantly across age, BMI, and ethnic groups, with lifestyle factors such as smoking contributing differently by ethnicity. These findings provide a basis for more targeted public health interventions and resource allocation for diabetes management.

Keywords: diabetes, NHANES, random forest, gradient boosting machine, support vector machine

Procedia PDF Downloads 8
11594 Multi-Factor Optimization Method through Machine Learning in Building Envelope Design: Focusing on Perforated Metal Façade

Authors: Jinwooung Kim, Jae-Hwan Jung, Seong-Jun Kim, Sung-Ah Kim

Abstract:

Because the building envelope has a significant impact on the operation and maintenance stage of the building, designing the facade considering the performance can improve the performance of the building and lower the maintenance cost of the building. In general, however, optimizing two or more performance factors confronts the limits of time and computational tools. The optimization phase typically repeats infinitely until a series of processes that generate alternatives and analyze the generated alternatives achieve the desired performance. In particular, as complex geometry or precision increases, computational resources and time are prohibitive to find the required performance, so an optimization methodology is needed to deal with this. Instead of directly analyzing all the alternatives in the optimization process, applying experimental techniques (heuristic method) learned through experimentation and experience can reduce resource waste. This study proposes and verifies a method to optimize the double envelope of a building composed of a perforated panel using machine learning to the design geometry and quantitative performance. The proposed method is to achieve the required performance with fewer resources by supplementing the existing method which cannot calculate the complex shape of the perforated panel.

Keywords: building envelope, machine learning, perforated metal, multi-factor optimization, façade

Procedia PDF Downloads 224
11593 Soybean Seed Composition Prediction From Standing Crops Using Planet Scope Satellite Imagery and Machine Learning

Authors: Supria Sarkar, Vasit Sagan, Sourav Bhadra, Meghnath Pokharel, Felix B.Fritschi

Abstract:

Soybean and their derivatives are very important agricultural commodities around the world because of their wide applicability in human food, animal feed, biofuel, and industries. However, the significance of soybean production depends on the quality of the soybean seeds rather than the yield alone. Seed composition is widely dependent on plant physiological properties, aerobic and anaerobic environmental conditions, nutrient content, and plant phenological characteristics, which can be captured by high temporal resolution remote sensing datasets. Planet scope (PS) satellite images have high potential in sequential information of crop growth due to their frequent revisit throughout the world. In this study, we estimate soybean seed composition while the plants are in the field by utilizing PlanetScope (PS) satellite images and different machine learning algorithms. Several experimental fields were established with varying genotypes and different seed compositions were measured from the samples as ground truth data. The PS images were processed to extract 462 hand-crafted vegetative and textural features. Four machine learning algorithms, i.e., partial least squares (PLSR), random forest (RFR), gradient boosting machine (GBM), support vector machine (SVM), and two recurrent neural network architectures, i.e., long short-term memory (LSTM) and gated recurrent unit (GRU) were used in this study to predict oil, protein, sucrose, ash, starch, and fiber of soybean seed samples. The GRU and LSTM architectures had two separate branches, one for vegetative features and the other for textures features, which were later concatenated together to predict seed composition. The results show that sucrose, ash, protein, and oil yielded comparable prediction results. Machine learning algorithms that best predicted the six seed composition traits differed. GRU worked well for oil (R-Squared: of 0.53) and protein (R-Squared: 0.36), whereas SVR and PLSR showed the best result for sucrose (R-Squared: 0.74) and ash (R-Squared: 0.60), respectively. Although, the RFR and GBM provided comparable performance, the models tended to extremely overfit. Among the features, vegetative features were found as the most important variables compared to texture features. It is suggested to utilize many vegetation indices for machine learning training and select the best ones by using feature selection methods. Overall, the study reveals the feasibility and efficiency of PS images and machine learning for plot-level seed composition estimation. However, special care should be given while designing the plot size in the experiments to avoid mixed pixel issues.

Keywords: agriculture, computer vision, data science, geospatial technology

Procedia PDF Downloads 137
11592 Modern Information Security Management and Digital Technologies: A Comprehensive Approach to Data Protection

Authors: Mahshid Arabi

Abstract:

With the rapid expansion of digital technologies and the internet, information security has become a critical priority for organizations and individuals. The widespread use of digital tools such as smartphones and internet networks facilitates the storage of vast amounts of data, but simultaneously, vulnerabilities and security threats have significantly increased. The aim of this study is to examine and analyze modern methods of information security management and to develop a comprehensive model to counteract threats and information misuse. This study employs a mixed-methods approach, including both qualitative and quantitative analyses. Initially, a systematic review of previous articles and research in the field of information security was conducted. Then, using the Delphi method, interviews with 30 information security experts were conducted to gather their insights on security challenges and solutions. Based on the results of these interviews, a comprehensive model for information security management was developed. The proposed model includes advanced encryption techniques, machine learning-based intrusion detection systems, and network security protocols. AES and RSA encryption algorithms were used for data protection, and machine learning models such as Random Forest and Neural Networks were utilized for intrusion detection. Statistical analyses were performed using SPSS software. To evaluate the effectiveness of the proposed model, T-Test and ANOVA statistical tests were employed, and results were measured using accuracy, sensitivity, and specificity indicators of the models. Additionally, multiple regression analysis was conducted to examine the impact of various variables on information security. The findings of this study indicate that the comprehensive proposed model reduced cyber-attacks by an average of 85%. Statistical analysis showed that the combined use of encryption techniques and intrusion detection systems significantly improves information security. Based on the obtained results, it is recommended that organizations continuously update their information security systems and use a combination of multiple security methods to protect their data. Additionally, educating employees and raising public awareness about information security can serve as an effective tool in reducing security risks. This research demonstrates that effective and up-to-date information security management requires a comprehensive and coordinated approach, including the development and implementation of advanced techniques and continuous training of human resources.

Keywords: data protection, digital technologies, information security, modern management

Procedia PDF Downloads 29
11591 Content-Based Color Image Retrieval Based on the 2-D Histogram and Statistical Moments

Authors: El Asnaoui Khalid, Aksasse Brahim, Ouanan Mohammed

Abstract:

In this paper, we are interested in the problem of finding similar images in a large database. For this purpose we propose a new algorithm based on a combination of the 2-D histogram intersection in the HSV space and statistical moments. The proposed histogram is based on a 3x3 window and not only on the intensity of the pixel. This approach can overcome the drawback of the conventional 1-D histogram which is ignoring the spatial distribution of pixels in the image, while the statistical moments are used to escape the effects of the discretisation of the color space which is intrinsic to the use of histograms. We compare the performance of our new algorithm to various methods of the state of the art and we show that it has several advantages. It is fast, consumes little memory and requires no learning. To validate our results, we apply this algorithm to search for similar images in different image databases.

Keywords: 2-D histogram, statistical moments, indexing, similarity distance, histograms intersection

Procedia PDF Downloads 457
11590 Achieving High Renewable Energy Penetration in Western Australia Using Data Digitisation and Machine Learning

Authors: A. D. Tayal

Abstract:

The energy industry is undergoing significant disruption. This research outlines that, whilst challenging; this disruption is also an emerging opportunity for electricity utilities. One such opportunity is leveraging the developments in data analytics and machine learning. As the uptake of renewable energy technologies and complimentary control systems increases, electricity grids will likely transform towards dense microgrids with high penetration of renewable generation sources, rich in network and customer data, and linked through intelligent, wireless communications. Data digitisation and analytics have already impacted numerous industries, and its influence on the energy sector is growing, as computational capabilities increase to manage big data, and as machines develop algorithms to solve the energy challenges of the future. The objective of this paper is to address how far the uptake of renewable technologies can go given the constraints of existing grid infrastructure and provides a qualitative assessment of how higher levels of renewable energy penetration can be facilitated by incorporating even broader technological advances in the fields of data analytics and machine learning. Western Australia is used as a contextualised case study, given its abundance and diverse renewable resources (solar, wind, biomass, and wave) and isolated networks, making a high penetration of renewables a feasible target for policy makers over coming decades.

Keywords: data, innovation, renewable, solar

Procedia PDF Downloads 364
11589 Development of a Turbulent Boundary Layer Wall-pressure Fluctuations Power Spectrum Model Using a Stepwise Regression Algorithm

Authors: Zachary Huffman, Joana Rocha

Abstract:

Wall-pressure fluctuations induced by the turbulent boundary layer (TBL) developed over aircraft are a significant source of aircraft cabin noise. Since the power spectral density (PSD) of these pressure fluctuations is directly correlated with the amount of sound radiated into the cabin, the development of accurate empirical models that predict the PSD has been an important ongoing research topic. The sound emitted can be represented from the pressure fluctuations term in the Reynoldsaveraged Navier-Stokes equations (RANS). Therefore, early TBL empirical models (including those from Lowson, Robertson, Chase, and Howe) were primarily derived by simplifying and solving the RANS for pressure fluctuation and adding appropriate scales. Most subsequent models (including Goody, Efimtsov, Laganelli, Smol’yakov, and Rackl and Weston models) were derived by making modifications to these early models or by physical principles. Overall, these models have had varying levels of accuracy, but, in general, they are most accurate under the specific Reynolds and Mach numbers they were developed for, while being less accurate under other flow conditions. Despite this, recent research into the possibility of using alternative methods for deriving the models has been rather limited. More recent studies have demonstrated that an artificial neural network model was more accurate than traditional models and could be applied more generally, but the accuracy of other machine learning techniques has not been explored. In the current study, an original model is derived using a stepwise regression algorithm in the statistical programming language R, and TBL wall-pressure fluctuations PSD data gathered at the Carleton University wind tunnel. The theoretical advantage of a stepwise regression approach is that it will automatically filter out redundant or uncorrelated input variables (through the process of feature selection), and it is computationally faster than machine learning. The main disadvantage is the potential risk of overfitting. The accuracy of the developed model is assessed by comparing it to independently sourced datasets.

Keywords: aircraft noise, machine learning, power spectral density models, regression models, turbulent boundary layer wall-pressure fluctuations

Procedia PDF Downloads 135