Search results for: machine learning models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13490

Search results for: machine learning models

13340 Detecting Music Enjoyment Level Using Electroencephalogram Signals and Machine Learning Techniques

Authors: Raymond Feng, Shadi Ghiasi

Abstract:

An electroencephalogram (EEG) is a non-invasive technique that records electrical activity in the brain using scalp electrodes. Researchers have studied the use of EEG to detect emotions and moods by collecting signals from participants and analyzing how those signals correlate with their activities. In this study, researchers investigated the relationship between EEG signals and music enjoyment. Participants listened to music while data was collected. During the signal-processing phase, power spectral densities (PSDs) were computed from the signals, and dominant brainwave frequencies were extracted from the PSDs to form a comprehensive feature matrix. A machine learning approach was then taken to find correlations between the processed data and the music enjoyment level indicated by the participants. To improve on previous research, multiple machine learning models were employed, including K-Nearest Neighbors Classifier, Support Vector Classifier, and Decision Tree Classifier. Hyperparameters were used to fine-tune each model to further increase its performance. The experiments showed that a strong correlation exists, with the Decision Tree Classifier with hyperparameters yielding 85% accuracy. This study proves that EEG is a reliable means to detect music enjoyment and has future applications, including personalized music recommendation, mood adjustment, and mental health therapy.

Keywords: EEG, electroencephalogram, machine learning, mood, music enjoyment, physiological signals

Procedia PDF Downloads 21
13339 FlexPoints: Efficient Algorithm for Detection of Electrocardiogram Characteristic Points

Authors: Daniel Bulanda, Janusz A. Starzyk, Adrian Horzyk

Abstract:

The electrocardiogram (ECG) is one of the most commonly used medical tests, essential for correct diagnosis and treatment of the patient. While ECG devices generate a huge amount of data, only a small part of them carries valuable medical information. To deal with this problem, many compression algorithms and filters have been developed over the past years. However, the rapid development of new machine learning techniques poses new challenges. To address this class of problems, we created the FlexPoints algorithm that searches for characteristic points on the ECG signal and ignores all other points that do not carry relevant medical information. The conducted experiments proved that the presented algorithm can significantly reduce the number of data points which represents ECG signal without losing valuable medical information. These sparse but essential characteristic points (flex points) can be a perfect input for some modern machine learning models, which works much better using flex points as an input instead of raw data or data compressed by many popular algorithms.

Keywords: characteristic points, electrocardiogram, ECG, machine learning, signal compression

Procedia PDF Downloads 133
13338 Using Machine Learning to Classify Human Fetal Health and Analyze Feature Importance

Authors: Yash Bingi, Yiqiao Yin

Abstract:

Reduction of child mortality is an ongoing struggle and a commonly used factor in determining progress in the medical field. The under-5 mortality number is around 5 million around the world, with many of the deaths being preventable. In light of this issue, Cardiotocograms (CTGs) have emerged as a leading tool to determine fetal health. By using ultrasound pulses and reading the responses, CTGs help healthcare professionals assess the overall health of the fetus to determine the risk of child mortality. However, interpreting the results of the CTGs is time-consuming and inefficient, especially in underdeveloped areas where an expert obstetrician is hard to come by. Using a support vector machine (SVM) and oversampling, this paper proposed a model that classifies fetal health with an accuracy of 99.59%. To further explain the CTG measurements, an algorithm based on Randomized Input Sampling for Explanation ((RISE) of Black-box Models was created, called Feature Alteration for explanation of Black Box Models (FAB), and compared the findings to Shapley Additive Explanations (SHAP) and Local Interpretable Model Agnostic Explanations (LIME). This allows doctors and medical professionals to classify fetal health with high accuracy and determine which features were most influential in the process.

Keywords: machine learning, fetal health, gradient boosting, support vector machine, Shapley values, local interpretable model agnostic explanations

Procedia PDF Downloads 114
13337 An Approximation Technique to Automate Tron

Authors: P. Jayashree, S. Rajkumar

Abstract:

With the trend of virtual and augmented reality environments booming to provide a life like experience, gaming is a major tool in supporting such learning environments. In this work, a variant of Voronoi heuristics, employing supervised learning for the TRON game is proposed. The paper discusses the features that would be really useful when a machine learning bot is to be used as an opponent against a human player. Various game scenarios, nature of the bot and the experimental results are provided for the proposed variant to prove that the approach is better than those that are currently followed.

Keywords: artificial Intelligence, automation, machine learning, TRON game, Voronoi heuristics

Procedia PDF Downloads 434
13336 General Architecture for Automation of Machine Learning Practices

Authors: U. Borasi, Amit Kr. Jain, Rakesh, Piyush Jain

Abstract:

Data collection, data preparation, model training, model evaluation, and deployment are all processes in a typical machine learning workflow. Training data needs to be gathered and organised. This often entails collecting a sizable dataset and cleaning it to remove or correct any inaccurate or missing information. Preparing the data for use in the machine learning model requires pre-processing it after it has been acquired. This often entails actions like scaling or normalising the data, handling outliers, selecting appropriate features, reducing dimensionality, etc. This pre-processed data is then used to train a model on some machine learning algorithm. After the model has been trained, it needs to be assessed by determining metrics like accuracy, precision, and recall, utilising a test dataset. Every time a new model is built, both data pre-processing and model training—two crucial processes in the Machine learning (ML) workflow—must be carried out. Thus, there are various Machine Learning algorithms that can be employed for every single approach to data pre-processing, generating a large set of combinations to choose from. Example: for every method to handle missing values (dropping records, replacing with mean, etc.), for every scaling technique, and for every combination of features selected, a different algorithm can be used. As a result, in order to get the optimum outcomes, these tasks are frequently repeated in different combinations. This paper suggests a simple architecture for organizing this largely produced “combination set of pre-processing steps and algorithms” into an automated workflow which simplifies the task of carrying out all possibilities.

Keywords: machine learning, automation, AUTOML, architecture, operator pool, configuration, scheduler

Procedia PDF Downloads 25
13335 Early Prediction of Disposable Addresses in Ethereum Blockchain

Authors: Ahmad Saleem

Abstract:

Ethereum is the second largest crypto currency in blockchain ecosystem. Along with standard transactions, it supports smart contracts and NFT’s. Current research trends are focused on analyzing the overall structure of the network its growth and behavior. Ethereum addresses are anonymous and can be created on fly. The nature of Ethereum network and addresses make it hard to predict their behavior. The activity period of an ethereum address is not much analyzed. Using machine learning we can make early prediction about the disposability of the address. In this paper we analyzed the lifetime of the addresses. We also identified and predicted the disposable addresses using machine learning models and compared the results.

Keywords: blockchain, Ethereum, cryptocurrency, prediction

Procedia PDF Downloads 67
13334 Machine Learning Driven Analysis of Kepler Objects of Interest to Identify Exoplanets

Authors: Akshat Kumar, Vidushi

Abstract:

This paper identifies 27 KOIs, 26 of which are currently classified as candidates and one as false positives that have a high probability of being confirmed. For this purpose, 11 machine learning algorithms were implemented on the cumulative kepler dataset sourced from the NASA exoplanet archive; it was observed that the best-performing model was HistGradientBoosting and XGBoost with a test accuracy of 93.5%, and the lowest-performing model was Gaussian NB with a test accuracy of 54%, to test model performance F1, cross-validation score and RUC curve was calculated. Based on the learned models, the significant characteristics for confirm exoplanets were identified, putting emphasis on the object’s transit and stellar properties; these characteristics were namely koi_count, koi_prad, koi_period, koi_dor, koi_ror, and koi_smass, which were later considered to filter out the potential KOIs. The paper also calculates the Earth similarity index based on the planetary radius and equilibrium temperature for each KOI identified to aid in their classification.

Keywords: Kepler objects of interest, exoplanets, space exploration, machine learning, earth similarity index, transit photometry

Procedia PDF Downloads 22
13333 The Condition Testing of Damaged Plates Using Acoustic Features and Machine Learning

Authors: Kyle Saltmarsh

Abstract:

Acoustic testing possesses many benefits due to its non-destructive nature and practicality. There hence exists many scenarios in which using acoustic testing for condition testing shows powerful feasibility. A wealth of information is contained within the acoustic and vibration characteristics of structures, allowing the development meaningful features for the classification of their respective condition. In this paper, methods, results, and discussions are presented on the use of non-destructive acoustic testing coupled with acoustic feature extraction and machine learning techniques for the condition testing of manufactured circular steel plates subjected to varied levels of damage.

Keywords: plates, deformation, acoustic features, machine learning

Procedia PDF Downloads 307
13332 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 106
13331 Development of an Automatic Computational Machine Learning Pipeline to Process Confocal Fluorescence Images for Virtual Cell Generation

Authors: Miguel Contreras, David Long, Will Bachman

Abstract:

Background: Microscopy plays a central role in cell and developmental biology. In particular, fluorescence microscopy can be used to visualize specific cellular components and subsequently quantify their morphology through development of virtual-cell models for study of effects of mechanical forces on cells. However, there are challenges with these imaging experiments, which can make it difficult to quantify cell morphology: inconsistent results, time-consuming and potentially costly protocols, and limitation on number of labels due to spectral overlap. To address these challenges, the objective of this project is to develop an automatic computational machine learning pipeline to predict cellular components morphology for virtual-cell generation based on fluorescence cell membrane confocal z-stacks. Methods: Registered confocal z-stacks of nuclei and cell membrane of endothelial cells, consisting of 20 images each, were obtained from fluorescence confocal microscopy and normalized through software pipeline for each image to have a mean pixel intensity value of 0.5. An open source machine learning algorithm, originally developed to predict fluorescence labels on unlabeled transmitted light microscopy cell images, was trained using this set of normalized z-stacks on a single CPU machine. Through transfer learning, the algorithm used knowledge acquired from its previous training sessions to learn the new task. Once trained, the algorithm was used to predict morphology of nuclei using normalized cell membrane fluorescence images as input. Predictions were compared to the ground truth fluorescence nuclei images. Results: After one week of training, using one cell membrane z-stack (20 images) and corresponding nuclei label, results showed qualitatively good predictions on training set. The algorithm was able to accurately predict nuclei locations as well as shape when fed only fluorescence membrane images. Similar training sessions with improved membrane image quality, including clear lining and shape of the membrane, clearly showing the boundaries of each cell, proportionally improved nuclei predictions, reducing errors relative to ground truth. Discussion: These results show the potential of pre-trained machine learning algorithms to predict cell morphology using relatively small amounts of data and training time, eliminating the need of using multiple labels in immunofluorescence experiments. With further training, the algorithm is expected to predict different labels (e.g., focal-adhesion sites, cytoskeleton), which can be added to the automatic machine learning pipeline for direct input into Principal Component Analysis (PCA) for generation of virtual-cell mechanical models.

Keywords: cell morphology prediction, computational machine learning, fluorescence microscopy, virtual-cell models

Procedia PDF Downloads 175
13330 Convolutional Neural Networks versus Radiomic Analysis for Classification of Breast Mammogram

Authors: Mehwish Asghar

Abstract:

Breast Cancer (BC) is a common type of cancer among women. Its screening is usually performed using different imaging modalities such as magnetic resonance imaging, mammogram, X-ray, CT, etc. Among these modalities’ mammogram is considered a powerful tool for diagnosis and screening of breast cancer. Sophisticated machine learning approaches have shown promising results in complementing human diagnosis. Generally, machine learning methods can be divided into two major classes: one is Radiomics analysis (RA), where image features are extracted manually; and the other one is the concept of convolutional neural networks (CNN), in which the computer learns to recognize image features on its own. This research aims to improve the incidence of early detection, thus reducing the mortality rate caused by breast cancer through the latest advancements in computer science, in general, and machine learning, in particular. It has also been aimed to ease the burden of doctors by improving and automating the process of breast cancer detection. This research is related to a relative analysis of different techniques for the implementation of different models for detecting and classifying breast cancer. The main goal of this research is to provide a detailed view of results and performances between different techniques. The purpose of this paper is to explore the potential of a convolutional neural network (CNN) w.r.t feature extractor and as a classifier. Also, in this research, it has been aimed to add the module of Radiomics for comparison of its results with deep learning techniques.

Keywords: breast cancer (BC), machine learning (ML), convolutional neural network (CNN), radionics, magnetic resonance imaging, artificial intelligence

Procedia PDF Downloads 190
13329 A Comparative Study of Optimization Techniques and Models to Forecasting Dengue Fever

Authors: Sudha T., Naveen C.

Abstract:

Dengue is a serious public health issue that causes significant annual economic and welfare burdens on nations. However, enhanced optimization techniques and quantitative modeling approaches can predict the incidence of dengue. By advocating for a data-driven approach, public health officials can make informed decisions, thereby improving the overall effectiveness of sudden disease outbreak control efforts. The National Oceanic and Atmospheric Administration and the Centers for Disease Control and Prevention are two of the U.S. Federal Government agencies from which this study uses environmental data. Based on environmental data that describe changes in temperature, precipitation, vegetation, and other factors known to affect dengue incidence, many predictive models are constructed that use different machine learning methods to estimate weekly dengue cases. The first step involves preparing the data, which includes handling outliers and missing values to make sure the data is prepared for subsequent processing and the creation of an accurate forecasting model. In the second phase, multiple feature selection procedures are applied using various machine learning models and optimization techniques. During the third phase of the research, machine learning models like the Huber Regressor, Support Vector Machine, Gradient Boosting Regressor (GBR), and Support Vector Regressor (SVR) are compared with several optimization techniques for feature selection, such as Harmony Search and Genetic Algorithm. In the fourth stage, the model's performance is evaluated using Mean Square Error (MSE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) as assistance. Selecting an optimization strategy with the least number of errors, lowest price, biggest productivity, or maximum potential results is the goal. In a variety of industries, including engineering, science, management, mathematics, finance, and medicine, optimization is widely employed. An effective optimization method based on harmony search and an integrated genetic algorithm is introduced for input feature selection, and it shows an important improvement in the model's predictive accuracy. The predictive models with Huber Regressor as the foundation perform the best for optimization and also prediction.

Keywords: deep learning model, dengue fever, prediction, optimization

Procedia PDF Downloads 20
13328 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 96
13327 Using Machine Learning Techniques for Autism Spectrum Disorder Analysis and Detection in Children

Authors: Norah Mohammed Alshahrani, Abdulaziz Almaleh

Abstract:

Autism Spectrum Disorder (ASD) is a condition related to issues with brain development that affects how a person recognises and communicates with others which results in difficulties with interaction and communication socially and it is constantly growing. Early recognition of ASD allows children to lead safe and healthy lives and helps doctors with accurate diagnoses and management of conditions. Therefore, it is crucial to develop a method that will achieve good results and with high accuracy for the measurement of ASD in children. In this paper, ASD datasets of toddlers and children have been analyzed. We employed the following machine learning techniques to attempt to explore ASD and they are Random Forest (RF), Decision Tree (DT), Na¨ıve Bayes (NB) and Support Vector Machine (SVM). Then Feature selection was used to provide fewer attributes from ASD datasets while preserving model performance. As a result, we found that the best result has been provided by the Support Vector Machine (SVM), achieving 0.98% in the toddler dataset and 0.99% in the children dataset.

Keywords: autism spectrum disorder, machine learning, feature selection, support vector machine

Procedia PDF Downloads 111
13326 A Deep Learning Approach for the Predictive Quality of Directional Valves in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

The increasing use of deep learning applications in production is becoming a competitive advantage. Predictive quality enables the assurance of product quality by using data-driven forecasts via machine learning models as a basis for decisions on test results. The use of real Bosch production data along the value chain of hydraulic valves is a promising approach to classifying the leakage of directional valves.

Keywords: artificial neural networks, classification, hydraulics, predictive quality, deep learning

Procedia PDF Downloads 200
13325 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates

Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe

Abstract:

Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.

Keywords: machine learning, MTB, WGS, drug resistant TB

Procedia PDF Downloads 21
13324 Risk Assessment and Management Using Machine Learning Models

Authors: Lagnajeet Mohanty, Mohnish Mishra, Pratham Tapdiya, Himanshu Sekhar Nayak, Swetapadma Singh

Abstract:

In the era of global interconnectedness, effective risk assessment and management are critical for organizational resilience. This review explores the integration of machine learning (ML) into risk processes, examining its transformative potential and the challenges it presents. The literature reveals ML's success in sectors like consumer credit, demonstrating enhanced predictive accuracy, adaptability, and potential cost savings. However, ethical considerations, interpretability issues, and the demand for skilled practitioners pose limitations. Looking forward, the study identifies future research scopes, including refining ethical frameworks, advancing interpretability techniques, and fostering interdisciplinary collaborations. The synthesis of limitations and future directions highlights the dynamic landscape of ML in risk management, urging stakeholders to navigate challenges innovatively. This abstract encapsulates the evolving discourse on ML's role in shaping proactive and effective risk management strategies in our interconnected and unpredictable global landscape.

Keywords: machine learning, risk assessment, ethical considerations, financial inclusion

Procedia PDF Downloads 25
13323 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal

Abstract:

The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 174
13322 Machine Learning Application in Shovel Maintenance

Authors: Amir Taghizadeh Vahed, Adithya Thaduri

Abstract:

Shovels are the main components in the mining transportation system. The productivity of the mines depends on the availability of shovels due to its high capital and operating costs. The unplanned failure/shutdowns of a shovel results in higher repair costs, increase in downtime, as well as increasing indirect cost (i.e. loss of production and company’s reputation). In order to mitigate these failures, predictive maintenance can be useful approach using failure prediction. The modern mining machinery or shovels collect huge datasets automatically; it consists of reliability and maintenance data. However, the gathered datasets are useless until the information and knowledge of data are extracted. Machine learning as well as data mining, which has a major role in recent studies, has been used for the knowledge discovery process. In this study, data mining and machine learning approaches are implemented to detect not only anomalies but also patterns from a dataset and further detection of failures.

Keywords: maintenance, machine learning, shovel, conditional based monitoring

Procedia PDF Downloads 180
13321 Neural Network Based Decision Trees Using Machine Learning for Alzheimer's Diagnosis

Authors: P. S. Jagadeesh Kumar, Tracy Lin Huan, S. Meenakshi Sundaram

Abstract:

Alzheimer’s disease is one of the prevalent kind of ailment, expected for impudent reconciliation or an effectual therapy is to be accredited hitherto. Probable detonation of patients in the upcoming years, and consequently an enormous deal of apprehension in early discovery of the disorder, this will conceivably chaperon to enhanced healing outcomes. Complex impetuosity of the brain is an observant symbolic of the disease and a unique recognition of genetic sign of the disease. Machine learning alongside deep learning and decision tree reinforces the aptitude to absorb characteristics from multi-dimensional data’s and thus simplifies automatic classification of Alzheimer’s disease. Susceptible testing was prophesied and realized in training the prospect of Alzheimer’s disease classification built on machine learning advances. It was shrewd that the decision trees trained with deep neural network fashioned the excellent results parallel to related pattern classification.

Keywords: Alzheimer's diagnosis, decision trees, deep neural network, machine learning, pattern classification

Procedia PDF Downloads 268
13320 The Accuracy of Parkinson's Disease Diagnosis Using [123I]-FP-CIT Brain SPECT Data with Machine Learning Techniques: A Survey

Authors: Lavanya Madhuri Bollipo, K. V. Kadambari

Abstract:

Objective: To discuss key issues in the diagnosis of Parkinson disease (PD), To discuss features influencing PD progression, To discuss importance of brain SPECT data in PD diagnosis, and To discuss the essentiality of machine learning techniques in early diagnosis of PD. An accurate and early diagnosis of PD is nowadays a challenge as clinical symptoms in PD arise only when there is more than 60% loss of dopaminergic neurons. So far there are no laboratory tests for the diagnosis of PD, causing a high rate of misdiagnosis especially when the disease is in the early stages. Recent neuroimaging studies with brain SPECT using 123I-Ioflupane (DaTSCAN) as radiotracer shown to be widely used to assist the diagnosis of PD even in its early stages. Machine learning techniques can be used in combination with image analysis procedures to develop computer-aided diagnosis (CAD) systems for PD. This paper addressed recent studies involving diagnosis of PD in its early stages using brain SPECT data with Machine Learning Techniques.

Keywords: Parkinson disease (PD), dopamine transporter, single-photon emission computed tomography (SPECT), support vector machine (SVM)

Procedia PDF Downloads 359
13319 A Comprehensive Study of Camouflaged Object Detection Using Deep Learning

Authors: Khalak Bin Khair, Saqib Jahir, Mohammed Ibrahim, Fahad Bin, Debajyoti Karmaker

Abstract:

Object detection is a computer technology that deals with searching through digital images and videos for occurrences of semantic elements of a particular class. It is associated with image processing and computer vision. On top of object detection, we detect camouflage objects within an image using Deep Learning techniques. Deep learning may be a subset of machine learning that's essentially a three-layer neural network Over 6500 images that possess camouflage properties are gathered from various internet sources and divided into 4 categories to compare the result. Those images are labeled and then trained and tested using vgg16 architecture on the jupyter notebook using the TensorFlow platform. The architecture is further customized using Transfer Learning. Methods for transferring information from one or more of these source tasks to increase learning in a related target task are created through transfer learning. The purpose of this transfer of learning methodologies is to aid in the evolution of machine learning to the point where it is as efficient as human learning.

Keywords: deep learning, transfer learning, TensorFlow, camouflage, object detection, architecture, accuracy, model, VGG16

Procedia PDF Downloads 103
13318 Machine Learning-Based Workflow for the Analysis of Project Portfolio

Authors: Jean Marie Tshimula, Atsushi Togashi

Abstract:

We develop a data-science approach for providing an interactive visualization and predictive models to find insights into the projects' historical data in order for stakeholders understand some unseen opportunities in the African market that might escape them behind the online project portfolio of the African Development Bank. This machine learning-based web application identifies the market trend of the fastest growing economies across the continent as well skyrocketing sectors which have a significant impact on the future of business in Africa. Owing to this, the approach is tailored to predict where the investment needs are the most required. Moreover, we create a corpus that includes the descriptions of over more than 1,200 projects that approximately cover 14 sectors designed for some of 53 African countries. Then, we sift out this large amount of semi-structured data for extracting tiny details susceptible to contain some directions to follow. In the light of the foregoing, we have applied the combination of Latent Dirichlet Allocation and Random Forests at the level of the analysis module of our methodology to highlight the most relevant topics that investors may focus on for investing in Africa.

Keywords: machine learning, topic modeling, natural language processing, big data

Procedia PDF Downloads 151
13317 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling

Authors: Florin Leon, Silvia Curteanu

Abstract:

Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.

Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression

Procedia PDF Downloads 270
13316 Unseen Classes: The Paradigm Shift in Machine Learning

Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan

Abstract:

Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.

Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery

Procedia PDF Downloads 136
13315 How Unicode Glyphs Revolutionized the Way We Communicate

Authors: Levi Corallo

Abstract:

Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.

Keywords: unicode, text symbols, emojis, glyphs, communication

Procedia PDF Downloads 169
13314 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 172
13313 Developing an Out-of-Distribution Generalization Model Selection Framework through Impurity and Randomness Measurements and a Bias Index

Authors: Todd Zhou, Mikhail Yurochkin

Abstract:

Out-of-distribution (OOD) detection is receiving increasing amounts of attention in the machine learning research community, boosted by recent technologies, such as autonomous driving and image processing. This newly-burgeoning field has called for the need for more effective and efficient methods for out-of-distribution generalization methods. Without accessing the label information, deploying machine learning models to out-of-distribution domains becomes extremely challenging since it is impossible to evaluate model performance on unseen domains. To tackle this out-of-distribution detection difficulty, we designed a model selection pipeline algorithm and developed a model selection framework with different impurity and randomness measurements to evaluate and choose the best-performing models for out-of-distribution data. By exploring different randomness scores based on predicted probabilities, we adopted the out-of-distribution entropy and developed a custom-designed score, ”CombinedScore,” as the evaluation criterion. This proposed score was created by adding labeled source information into the judging space of the uncertainty entropy score using harmonic mean. Furthermore, the prediction bias was explored through the equality of opportunity violation measurement. We also improved machine learning model performance through model calibration. The effectiveness of the framework with the proposed evaluation criteria was validated on the Folktables American Community Survey (ACS) datasets.

Keywords: model selection, domain generalization, model fairness, randomness measurements, bias index

Procedia PDF Downloads 101
13312 Modelling Conceptual Quantities Using Support Vector Machines

Authors: Ka C. Lam, Oluwafunmibi S. Idowu

Abstract:

Uncertainty in cost is a major factor affecting performance of construction projects. To our knowledge, several conceptual cost models have been developed with varying degrees of accuracy. Incorporating conceptual quantities into conceptual cost models could improve the accuracy of early predesign cost estimates. Hence, the development of quantity models for estimating conceptual quantities of framed reinforced concrete structures using supervised machine learning is the aim of the current research. Using measured quantities of structural elements and design variables such as live loads and soil bearing pressures, response and predictor variables were defined and used for constructing conceptual quantities models. Twenty-four models were developed for comparison using a combination of non-parametric support vector regression, linear regression, and bootstrap resampling techniques. R programming language was used for data analysis and model implementation. Gross soil bearing pressure and gross floor loading were discovered to have a major influence on the quantities of concrete and reinforcement used for foundations. Building footprint and gross floor loading had a similar influence on beams and slabs. Future research could explore the modelling of other conceptual quantities for walls, finishes, and services using machine learning techniques. Estimation of conceptual quantities would assist construction planners in early resource planning and enable detailed performance evaluation of early cost predictions.

Keywords: bootstrapping, conceptual quantities, modelling, reinforced concrete, support vector regression

Procedia PDF Downloads 187
13311 mKDNAD: A Network Flow Anomaly Detection Method Based On Multi-teacher Knowledge Distillation

Authors: Yang Yang, Dan Liu

Abstract:

Anomaly detection models for network flow based on machine learning have poor detection performance under extremely unbalanced training data conditions and also have slow detection speed and large resource consumption when deploying on network edge devices. Embedding multi-teacher knowledge distillation (mKD) in anomaly detection can transfer knowledge from multiple teacher models to a single model. Inspired by this, we proposed a state-of-the-art model, mKDNAD, to improve detection performance. mKDNAD mine and integrate the knowledge of one-dimensional sequence and two-dimensional image implicit in network flow to improve the detection accuracy of small sample classes. The multi-teacher knowledge distillation method guides the train of the student model, thus speeding up the model's detection speed and reducing the number of model parameters. Experiments in the CICIDS2017 dataset verify the improvements of our method in the detection speed and the detection accuracy in dealing with the small sample classes.

Keywords: network flow anomaly detection (NAD), multi-teacher knowledge distillation, machine learning, deep learning

Procedia PDF Downloads 88