Search results for: machine learning tools and techniques
16839 Enabling Non-invasive Diagnosis of Thyroid Nodules with High Specificity and Sensitivity
Authors: Sai Maniveer Adapa, Sai Guptha Perla, Adithya Reddy P.
Abstract:
Thyroid nodules can often be diagnosed with ultrasound imaging, although differentiating between benign and malignant nodules can be challenging for medical professionals. This work suggests a novel approach to increase the precision of thyroid nodule identification by combining machine learning and deep learning. The new approach first extracts information from the ultrasound pictures using a deep learning method known as a convolutional autoencoder. A support vector machine, a type of machine learning model, is then trained using these features. With an accuracy of 92.52%, the support vector machine can differentiate between benign and malignant nodules. This innovative technique may decrease the need for pointless biopsies and increase the accuracy of thyroid nodule detection.Keywords: thyroid tumor diagnosis, ultrasound images, deep learning, machine learning, convolutional auto-encoder, support vector machine
Procedia PDF Downloads 5816838 Combining Shallow and Deep Unsupervised Machine Learning Techniques to Detect Bad Actors in Complex Datasets
Authors: Jun Ming Moey, Zhiyaun Chen, David Nicholson
Abstract:
Bad actors are often hard to detect in data that imprints their behaviour patterns because they are comparatively rare events embedded in non-bad actor data. An unsupervised machine learning framework is applied here to detect bad actors in financial crime datasets that record millions of transactions undertaken by hundreds of actors (<0.01% bad). Specifically, the framework combines ‘shallow’ (PCA, Isolation Forest) and ‘deep’ (Autoencoder) methods to detect outlier patterns. Detection performance analysis for both the individual methods and their combination is reported.Keywords: detection, machine learning, deep learning, unsupervised, outlier analysis, data science, fraud, financial crime
Procedia PDF Downloads 9416837 Convolutional Neural Networks versus Radiomic Analysis for Classification of Breast Mammogram
Authors: Mehwish Asghar
Abstract:
Breast Cancer (BC) is a common type of cancer among women. Its screening is usually performed using different imaging modalities such as magnetic resonance imaging, mammogram, X-ray, CT, etc. Among these modalities’ mammogram is considered a powerful tool for diagnosis and screening of breast cancer. Sophisticated machine learning approaches have shown promising results in complementing human diagnosis. Generally, machine learning methods can be divided into two major classes: one is Radiomics analysis (RA), where image features are extracted manually; and the other one is the concept of convolutional neural networks (CNN), in which the computer learns to recognize image features on its own. This research aims to improve the incidence of early detection, thus reducing the mortality rate caused by breast cancer through the latest advancements in computer science, in general, and machine learning, in particular. It has also been aimed to ease the burden of doctors by improving and automating the process of breast cancer detection. This research is related to a relative analysis of different techniques for the implementation of different models for detecting and classifying breast cancer. The main goal of this research is to provide a detailed view of results and performances between different techniques. The purpose of this paper is to explore the potential of a convolutional neural network (CNN) w.r.t feature extractor and as a classifier. Also, in this research, it has been aimed to add the module of Radiomics for comparison of its results with deep learning techniques.Keywords: breast cancer (BC), machine learning (ML), convolutional neural network (CNN), radionics, magnetic resonance imaging, artificial intelligence
Procedia PDF Downloads 22516836 A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
Authors: Khalak Bin Khair, Saqib Jahir, Mohammed Ibrahim, Fahad Bin, Debajyoti Karmaker
Abstract:
Object detection is a computer technology that deals with searching through digital images and videos for occurrences of semantic elements of a particular class. It is associated with image processing and computer vision. On top of object detection, we detect camouflage objects within an image using Deep Learning techniques. Deep learning may be a subset of machine learning that's essentially a three-layer neural network Over 6500 images that possess camouflage properties are gathered from various internet sources and divided into 4 categories to compare the result. Those images are labeled and then trained and tested using vgg16 architecture on the jupyter notebook using the TensorFlow platform. The architecture is further customized using Transfer Learning. Methods for transferring information from one or more of these source tasks to increase learning in a related target task are created through transfer learning. The purpose of this transfer of learning methodologies is to aid in the evolution of machine learning to the point where it is as efficient as human learning.Keywords: deep learning, transfer learning, TensorFlow, camouflage, object detection, architecture, accuracy, model, VGG16
Procedia PDF Downloads 14916835 Deleterious SNP’s Detection Using Machine Learning
Authors: Hamza Zidoum
Abstract:
This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM
Procedia PDF Downloads 37716834 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering
Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal
Abstract:
The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease
Procedia PDF Downloads 20316833 Enhancing Code Security with AI-Powered Vulnerability Detection
Authors: Zzibu Mark Brian
Abstract:
As software systems become increasingly complex, ensuring code security is a growing concern. Traditional vulnerability detection methods often rely on manual code reviews or static analysis tools, which can be time-consuming and prone to errors. This paper presents a distinct approach to enhancing code security by leveraging artificial intelligence (AI) and machine learning (ML) techniques. Our proposed system utilizes a combination of natural language processing (NLP) and deep learning algorithms to identify and classify vulnerabilities in real-world codebases. By analyzing vast amounts of open-source code data, our AI-powered tool learns to recognize patterns and anomalies indicative of security weaknesses. We evaluated our system on a dataset of over 10,000 open-source projects, achieving an accuracy rate of 92% in detecting known vulnerabilities. Furthermore, our tool identified previously unknown vulnerabilities in popular libraries and frameworks, demonstrating its potential for improving software security.Keywords: AI, machine language, cord security, machine leaning
Procedia PDF Downloads 3616832 Comprehensive Machine Learning-Based Glucose Sensing from Near-Infrared Spectra
Authors: Bitewulign Mekonnen
Abstract:
Context: This scientific paper focuses on the use of near-infrared (NIR) spectroscopy to determine glucose concentration in aqueous solutions accurately and rapidly. The study compares six different machine learning methods for predicting glucose concentration and also explores the development of a deep learning model for classifying NIR spectra. The objective is to optimize the detection model and improve the accuracy of glucose prediction. This research is important because it provides a comprehensive analysis of various machine-learning techniques for estimating aqueous glucose concentrations. Research Aim: The aim of this study is to compare and evaluate different machine-learning methods for predicting glucose concentration from NIR spectra. Additionally, the study aims to develop and assess a deep-learning model for classifying NIR spectra. Methodology: The research methodology involves the use of machine learning and deep learning techniques. Six machine learning regression models, including support vector machine regression, partial least squares regression, extra tree regression, random forest regression, extreme gradient boosting, and principal component analysis-neural network, are employed to predict glucose concentration. The NIR spectra data is randomly divided into train and test sets, and the process is repeated ten times to increase generalization ability. In addition, a convolutional neural network is developed for classifying NIR spectra. Findings: The study reveals that the SVMR, ETR, and PCA-NN models exhibit excellent performance in predicting glucose concentration, with correlation coefficients (R) > 0.99 and determination coefficients (R²)> 0.985. The deep learning model achieves high macro-averaging scores for precision, recall, and F1-measure. These findings demonstrate the effectiveness of machine learning and deep learning methods in optimizing the detection model and improving glucose prediction accuracy. Theoretical Importance: This research contributes to the field by providing a comprehensive analysis of various machine-learning techniques for estimating glucose concentrations from NIR spectra. It also explores the use of deep learning for the classification of indistinguishable NIR spectra. The findings highlight the potential of machine learning and deep learning in enhancing the prediction accuracy of glucose-relevant features. Data Collection and Analysis Procedures: The NIR spectra and corresponding references for glucose concentration are measured in increments of 20 mg/dl. The data is randomly divided into train and test sets, and the models are evaluated using regression analysis and classification metrics. The performance of each model is assessed based on correlation coefficients, determination coefficients, precision, recall, and F1-measure. Question Addressed: The study addresses the question of whether machine learning and deep learning methods can optimize the detection model and improve the accuracy of glucose prediction from NIR spectra. Conclusion: The research demonstrates that machine learning and deep learning methods can effectively predict glucose concentration from NIR spectra. The SVMR, ETR, and PCA-NN models exhibit superior performance, while the deep learning model achieves high classification scores. These findings suggest that machine learning and deep learning techniques can be used to improve the prediction accuracy of glucose-relevant features. Further research is needed to explore their clinical utility in analyzing complex matrices, such as blood glucose levels.Keywords: machine learning, signal processing, near-infrared spectroscopy, support vector machine, neural network
Procedia PDF Downloads 9416831 Constructing a Physics Guided Machine Learning Neural Network to Predict Tonal Noise Emitted by a Propeller
Authors: Arthur D. Wiedemann, Christopher Fuller, Kyle A. Pascioni
Abstract:
With the introduction of electric motors, small unmanned aerial vehicle designers have to consider trade-offs between acoustic noise and thrust generated. Currently, there are few low-computational tools available for predicting acoustic noise emitted by a propeller into the far-field. Artificial neural networks offer a highly non-linear and adaptive model for predicting isolated and interactive tonal noise. But neural networks require large data sets, exceeding practical considerations in modeling experimental results. A methodology known as physics guided machine learning has been applied in this study to reduce the required data set to train the network. After building and evaluating several neural networks, the best model is investigated to determine how the network successfully predicts the acoustic waveform. Lastly, a post-network transfer function is developed to remove discontinuity from the predicted waveform. Overall, methodologies from physics guided machine learning show a notable improvement in prediction performance, but additional loss functions are necessary for constructing predictive networks on small datasets.Keywords: aeroacoustics, machine learning, propeller, rotor, neural network, physics guided machine learning
Procedia PDF Downloads 22816830 AutoML: Comprehensive Review and Application to Engineering Datasets
Authors: Parsa Mahdavi, M. Amin Hariri-Ardebili
Abstract:
The development of accurate machine learning and deep learning models traditionally demands hands-on expertise and a solid background to fine-tune hyperparameters. With the continuous expansion of datasets in various scientific and engineering domains, researchers increasingly turn to machine learning methods to unveil hidden insights that may elude classic regression techniques. This surge in adoption raises concerns about the adequacy of the resultant meta-models and, consequently, the interpretation of the findings. In response to these challenges, automated machine learning (AutoML) emerges as a promising solution, aiming to construct machine learning models with minimal intervention or guidance from human experts. AutoML encompasses crucial stages such as data preparation, feature engineering, hyperparameter optimization, and neural architecture search. This paper provides a comprehensive overview of the principles underpinning AutoML, surveying several widely-used AutoML platforms. Additionally, the paper offers a glimpse into the application of AutoML on various engineering datasets. By comparing these results with those obtained through classical machine learning methods, the paper quantifies the uncertainties inherent in the application of a single ML model versus the holistic approach provided by AutoML. These examples showcase the efficacy of AutoML in extracting meaningful patterns and insights, emphasizing its potential to revolutionize the way we approach and analyze complex datasets.Keywords: automated machine learning, uncertainty, engineering dataset, regression
Procedia PDF Downloads 6116829 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique
Authors: C. Manjula, Lilly Florence
Abstract:
Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.Keywords: decision tree, genetic algorithm, machine learning, software defect prediction
Procedia PDF Downloads 32916828 Machine Learning Algorithms for Rocket Propulsion
Authors: Rômulo Eustáquio Martins de Souza, Paulo Alexandre Rodrigues de Vasconcelos Figueiredo
Abstract:
In recent years, there has been a surge in interest in applying artificial intelligence techniques, particularly machine learning algorithms. Machine learning is a data-analysis technique that automates the creation of analytical models, making it especially useful for designing complex situations. As a result, this technology aids in reducing human intervention while producing accurate results. This methodology is also extensively used in aerospace engineering since this is a field that encompasses several high-complexity operations, such as rocket propulsion. Rocket propulsion is a high-risk operation in which engine failure could result in the loss of life. As a result, it is critical to use computational methods capable of precisely representing the spacecraft's analytical model to guarantee its security and operation. Thus, this paper describes the use of machine learning algorithms for rocket propulsion to aid the realization that this technique is an efficient way to deal with challenging and restrictive aerospace engineering activities. The paper focuses on three machine-learning-aided rocket propulsion applications: set-point control of an expander-bleed rocket engine, supersonic retro-propulsion of a small-scale rocket, and leak detection and isolation on rocket engine data. This paper describes the data-driven methods used for each implementation in depth and presents the obtained results.Keywords: data analysis, modeling, machine learning, aerospace, rocket propulsion
Procedia PDF Downloads 11516827 A Generalized Framework for Adaptive Machine Learning Deployments in Algorithmic Trading
Authors: Robert Caulk
Abstract:
A generalized framework for adaptive machine learning deployments in algorithmic trading is introduced, tested, and released as open-source code. The presented software aims to test the hypothesis that recent data contains enough information to form a probabilistically favorable short-term price prediction. Further, the framework contains various adaptive machine learning techniques that are geared toward generating profit during strong trends and minimizing losses during trend changes. Results demonstrate that this adaptive machine learning approach is capable of capturing trends and generating profit. The presentation also discusses the importance of defining the parameter space associated with the dynamic training data-set and using the parameter space to identify and remove outliers from prediction data points. Meanwhile, the generalized architecture enables common users to exploit the powerful machinery while focusing on high-level feature engineering and model testing. The presentation also highlights common strengths and weaknesses associated with the presented technique and presents a broad range of well-tested starting points for feature set construction, target setting, and statistical methods for enforcing risk management and maintaining probabilistically favorable entry and exit points. The presentation also describes the end-to-end data processing tools associated with FreqAI, including automatic data fetching, data aggregation, feature engineering, safe and robust data pre-processing, outlier detection, custom machine learning and statistical tools, data post-processing, and adaptive training backtest emulation, and deployment of adaptive training in live environments. Finally, the generalized user interface is also discussed in the presentation. Feature engineering is simplified so that users can seed their feature sets with common indicator libraries (e.g. TA-lib, pandas-ta). The user also feeds data expansion parameters to fill out a large feature set for the model, which can contain as many as 10,000+ features. The presentation describes the various object-oriented programming techniques employed to make FreqAI agnostic to third-party libraries and external data sources. In other words, the back-end is constructed in such a way that users can leverage a broad range of common regression libraries (Catboost, LightGBM, Sklearn, etc) as well as common Neural Network libraries (TensorFlow, PyTorch) without worrying about the logistical complexities associated with data handling and API interactions. The presentation finishes by drawing conclusions about the most important parameters associated with a live deployment of the adaptive learning framework and provides the road map for future development in FreqAI.Keywords: machine learning, market trend detection, open-source, adaptive learning, parameter space exploration
Procedia PDF Downloads 8916826 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects
Authors: Victor Radich, Tania Basso, Regina Moraes
Abstract:
Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring
Procedia PDF Downloads 8516825 Comprehensive Review of Adversarial Machine Learning in PDF Malware
Authors: Preston Nabors, Nasseh Tabrizi
Abstract:
Portable Document Format (PDF) files have gained significant popularity for sharing and distributing documents due to their universal compatibility. However, the widespread use of PDF files has made them attractive targets for cybercriminals, who exploit vulnerabilities to deliver malware and compromise the security of end-user systems. This paper reviews notable contributions in PDF malware detection, including static, dynamic, signature-based, and hybrid analysis. It presents a comprehensive examination of PDF malware detection techniques, focusing on the emerging threat of adversarial sampling and the need for robust defense mechanisms. The paper highlights the vulnerability of machine learning classifiers to evasion attacks. It explores adversarial sampling techniques in PDF malware detection to produce mimicry and reverse mimicry evasion attacks, which aim to bypass detection systems. Improvements for future research are identified, including accessible methods, applying adversarial sampling techniques to malicious payloads, evaluating other models, evaluating the importance of features to malware, implementing adversarial defense techniques, and conducting comprehensive examination across various scenarios. By addressing these opportunities, researchers can enhance PDF malware detection and develop more resilient defense mechanisms against adversarial attacks.Keywords: adversarial attacks, adversarial defense, adversarial machine learning, intrusion detection, PDF malware, malware detection, malware detection evasion
Procedia PDF Downloads 3916824 A Machine Learning Approach for the Leakage Classification in the Hydraulic Final Test
Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter
Abstract:
The widespread use of machine learning applications in production is significantly accelerated by improved computing power and increasing data availability. Predictive quality enables the assurance of product quality by using machine learning models as a basis for decisions on test results. The use of real Bosch production data based on geometric gauge blocks from machining, mating data from assembly and hydraulic measurement data from final testing of directional valves is a promising approach to classifying the quality characteristics of workpieces.Keywords: machine learning, classification, predictive quality, hydraulics, supervised learning
Procedia PDF Downloads 21316823 Machine Learning Automatic Detection on Twitter Cyberbullying
Authors: Raghad A. Altowairgi
Abstract:
With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost
Procedia PDF Downloads 13016822 Empowering a New Frontier in Heart Disease Detection: Unleashing Quantum Machine Learning
Authors: Sadia Nasrin Tisha, Mushfika Sharmin Rahman, Javier Orduz
Abstract:
Machine learning is applied in a variety of fields throughout the world. The healthcare sector has benefited enormously from it. One of the most effective approaches for predicting human heart diseases is to use machine learning applications to classify data and predict the outcome as a classification. However, with the rapid advancement of quantum technology, quantum computing has emerged as a potential game-changer for many applications. Quantum algorithms have the potential to execute substantially faster than their classical equivalents, which can lead to significant improvements in computational performance and efficiency. In this study, we applied quantum machine learning concepts to predict coronary heart diseases from text data. We experimented thrice with three different features; and three feature sets. The data set consisted of 100 data points. We pursue to do a comparative analysis of the two approaches, highlighting the potential benefits of quantum machine learning for predicting heart diseases.Keywords: quantum machine learning, SVM, QSVM, matrix product state
Procedia PDF Downloads 9416821 Optimization of Machine Learning Regression Results: An Application on Health Expenditures
Authors: Songul Cinaroglu
Abstract:
Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure
Procedia PDF Downloads 22616820 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum
Authors: Abdulrahman Sumayli, Saad M. AlShahrani
Abstract:
For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectivelyKeywords: temperature, pressure variations, machine learning, oil treatment
Procedia PDF Downloads 6916819 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation
Procedia PDF Downloads 23516818 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets
Authors: Najmeh Abedzadeh, Matthew Jacobs
Abstract:
An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.Keywords: IDS, imbalanced datasets, sampling algorithms, big data
Procedia PDF Downloads 32816817 Cardiokey: A Binary and Multi-Class Machine Learning Approach to Identify Individuals Using Electrocardiographic Signals on Wearable Devices
Authors: S. Chami, J. Chauvin, T. Demarest, Stan Ng, M. Straus, W. Jahner
Abstract:
Biometrics tools such as fingerprint and iris are widely used in industry to protect critical assets. However, their vulnerability and lack of robustness raise several worries about the protection of highly critical assets. Biometrics based on Electrocardiographic (ECG) signals is a robust identification tool. However, most of the state-of-the-art techniques have worked on clinical signals, which are of high quality and less noisy, extracted from wearable devices like a smartwatch. In this paper, we are presenting a complete machine learning pipeline that identifies people using ECG extracted from an off-person device. An off-person device is a wearable device that is not used in a medical context such as a smartwatch. In addition, one of the main challenges of ECG biometrics is the variability of the ECG of different persons and different situations. To solve this issue, we proposed two different approaches: per person classifier, and one-for-all classifier. The first approach suggests making binary classifier to distinguish one person from others. The second approach suggests a multi-classifier that distinguishes the selected set of individuals from non-selected individuals (others). The preliminary results, the binary classifier obtained a performance 90% in terms of accuracy within a balanced data. The second approach has reported a log loss of 0.05 as a multi-class score.Keywords: biometrics, electrocardiographic, machine learning, signals processing
Procedia PDF Downloads 14216816 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel
Abstract:
Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis
Procedia PDF Downloads 71316815 Heart Attack Prediction Using Several Machine Learning Methods
Authors: Suzan Anwar, Utkarsh Goyal
Abstract:
Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest
Procedia PDF Downloads 13816814 An Application for Risk of Crime Prediction Using Machine Learning
Authors: Luis Fonseca, Filipe Cabral Pinto, Susana Sargento
Abstract:
The increase of the world population, especially in large urban centers, has resulted in new challenges particularly with the control and optimization of public safety. Thus, in the present work, a solution is proposed for the prediction of criminal occurrences in a city based on historical data of incidents and demographic information. The entire research and implementation will be presented start with the data collection from its original source, the treatment and transformations applied to them, choice and the evaluation and implementation of the Machine Learning model up to the application layer. Classification models will be implemented to predict criminal risk for a given time interval and location. Machine Learning algorithms such as Random Forest, Neural Networks, K-Nearest Neighbors and Logistic Regression will be used to predict occurrences, and their performance will be compared according to the data processing and transformation used. The results show that the use of Machine Learning techniques helps to anticipate criminal occurrences, which contributed to the reinforcement of public security. Finally, the models were implemented on a platform that will provide an API to enable other entities to make requests for predictions in real-time. An application will also be presented where it is possible to show criminal predictions visually.Keywords: crime prediction, machine learning, public safety, smart city
Procedia PDF Downloads 11116813 Using Greywolf Optimized Machine Learning Algorithms to Improve Accuracy for Predicting Hospital Readmission for Diabetes
Authors: Vincent Liu
Abstract:
Machine learning algorithms (ML) can achieve high accuracy in predicting outcomes compared to classical models. Metaheuristic, nature-inspired algorithms can enhance traditional ML algorithms by optimizing them such as by performing feature selection. We compare ten ML algorithms to predict 30-day hospital readmission rates for diabetes patients in the US using a dataset from UCI Machine Learning Repository with feature selection performed by Greywolf nature-inspired algorithm. The baseline accuracy for the initial random forest model was 65%. After performing feature engineering, SMOTE for class balancing, and Greywolf optimization, the machine learning algorithms showed better metrics, including F1 scores, accuracy, and confusion matrix with improvements ranging in 10%-30%, and a best model of XGBoost with an accuracy of 95%. Applying machine learning this way can improve patient outcomes as unnecessary rehospitalizations can be prevented by focusing on patients that are at a higher risk of readmission.Keywords: diabetes, machine learning, 30-day readmission, metaheuristic
Procedia PDF Downloads 6116812 Using Technology to Enhance the Student Assessment Experience
Authors: Asim Qayyum, David Smith
Abstract:
The use of information tools is a common activity for students of any educational stage when they encounter online learning activities. Finding the relevant information for particular learning tasks is the topic of this paper as it investigates the use of information tools for a group of student participants. The paper describes and discusses the results with particular implications for use in higher education, and the findings suggest that improvement in assessment design and subsequent student learning may be achieved by structuring the purposefulness of information tools usage and online reading behaviors of university students.Keywords: information tools, assessment, online learning, student assessment experience
Procedia PDF Downloads 56016811 DeClEx-Processing Pipeline for Tumor Classification
Authors: Gaurav Shinde, Sai Charan Gongiguntla, Prajwal Shirur, Ahmed Hambaba
Abstract:
Health issues are significantly increasing, putting a substantial strain on healthcare services. This has accelerated the integration of machine learning in healthcare, particularly following the COVID-19 pandemic. The utilization of machine learning in healthcare has grown significantly. We introduce DeClEx, a pipeline that ensures that data mirrors real-world settings by incorporating Gaussian noise and blur and employing autoencoders to learn intermediate feature representations. Subsequently, our convolutional neural network, paired with spatial attention, provides comparable accuracy to state-of-the-art pre-trained models while achieving a threefold improvement in training speed. Furthermore, we provide interpretable results using explainable AI techniques. We integrate denoising and deblurring, classification, and explainability in a single pipeline called DeClEx.Keywords: machine learning, healthcare, classification, explainability
Procedia PDF Downloads 5516810 A Study on Performance Prediction in Early Design Stage of Apartment Housing Using Machine Learning
Authors: Seongjun Kim, Sanghoon Shim, Jinwooung Kim, Jaehwan Jung, Sung-Ah Kim
Abstract:
As the development of information and communication technology, the convergence of machine learning of the ICT area and design is attempted. In this way, it is possible to grasp the correlation between various design elements, which was difficult to grasp, and to reflect this in the design result. In architecture, there is an attempt to predict the performance, which is difficult to grasp in the past, by finding the correlation among multiple factors mainly through machine learning. In architectural design area, some attempts to predict the performance affected by various factors have been tried. With machine learning, it is possible to quickly predict performance. The aim of this study is to propose a model that predicts performance according to the block arrangement of apartment housing through machine learning and the design alternative which satisfies the performance such as the daylight hours in the most similar form to the alternative proposed by the designer. Through this study, a designer can proceed with the design considering various design alternatives and accurate performances quickly from the early design stage.Keywords: apartment housing, machine learning, multi-objective optimization, performance prediction
Procedia PDF Downloads 481