Search results for: supervised machine learning algorithm
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 11104

Search results for: supervised machine learning algorithm

10984 A Supervised Goal Directed Algorithm in Economical Choice Behaviour: An Actor-Critic Approach

Authors: Keyvanl Yahya

Abstract:

This paper aims to find a algorithmic structure that affords to predict and explain economic choice behaviour particularly under uncertainty (random policies) by manipulating the prevalent Actor-Critic learning method that complies with the requirements we have been entrusted ever since the field of neuroeconomics dawned on us. Whilst skimming some basics of neuroeconomics that might be relevant to our discussion, we will try to outline some of the important works which have so far been done to simulate choice making processes. Concerning neurological findings that suggest the existence of two specific functions that are executed through Basal Ganglia all the way down to sub-cortical areas, namely 'rewards' and 'beliefs', we will offer a modified version of actor/critic algorithm to shed a light on the relation between these functions and most importantly resolve what is referred to as a challenge for actor-critic algorithms, that is lack of inheritance or hierarchy which avoids the system being evolved in continuous time tasks whence the convergence might not emerge.

Keywords: neuroeconomics, choice behaviour, decision making, reinforcement learning, actor-critic algorithm

Procedia PDF Downloads 375
10983 Hybrid Model: An Integration of Machine Learning with Traditional Scorecards

Authors: Golnush Masghati-Amoli, Paul Chin

Abstract:

Over the past recent years, with the rapid increases in data availability and computing power, Machine Learning (ML) techniques have been called on in a range of different industries for their strong predictive capability. However, the use of Machine Learning in commercial banking has been limited due to a special challenge imposed by numerous regulations that require lenders to be able to explain their analytic models, not only to regulators but often to consumers. In other words, although Machine Leaning techniques enable better prediction with a higher level of accuracy, in comparison with other industries, they are adopted less frequently in commercial banking especially for scoring purposes. This is due to the fact that Machine Learning techniques are often considered as a black box and fail to provide information on why a certain risk score is given to a customer. In order to bridge this gap between the explain-ability and performance of Machine Learning techniques, a Hybrid Model is developed at Dun and Bradstreet that is focused on blending Machine Learning algorithms with traditional approaches such as scorecards. The Hybrid Model maximizes efficiency of traditional scorecards by merging its practical benefits, such as explain-ability and the ability to input domain knowledge, with the deep insights of Machine Learning techniques which can uncover patterns scorecard approaches cannot. First, through development of Machine Learning models, engineered features and latent variables and feature interactions that demonstrate high information value in the prediction of customer risk are identified. Then, these features are employed to introduce observed non-linear relationships between the explanatory and dependent variables into traditional scorecards. Moreover, instead of directly computing the Weight of Evidence (WoE) from good and bad data points, the Hybrid Model tries to match the score distribution generated by a Machine Learning algorithm, which ends up providing an estimate of the WoE for each bin. This capability helps to build powerful scorecards with sparse cases that cannot be achieved with traditional approaches. The proposed Hybrid Model is tested on different portfolios where a significant gap is observed between the performance of traditional scorecards and Machine Learning models. The result of analysis shows that Hybrid Model can improve the performance of traditional scorecards by introducing non-linear relationships between explanatory and target variables from Machine Learning models into traditional scorecards. Also, it is observed that in some scenarios the Hybrid Model can be almost as predictive as the Machine Learning techniques while being as transparent as traditional scorecards. Therefore, it is concluded that, with the use of Hybrid Model, Machine Learning algorithms can be used in the commercial banking industry without being concerned with difficulties in explaining the models for regulatory purposes.

Keywords: machine learning algorithms, scorecard, commercial banking, consumer risk, feature engineering

Procedia PDF Downloads 104
10982 Artificial Intelligence in Bioscience: The Next Frontier

Authors: Parthiban Srinivasan

Abstract:

With recent advances in computational power and access to enough data in biosciences, artificial intelligence methods are increasingly being used in drug discovery research. These methods are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Our goal is to develop a model that accurately predicts biological activity and toxicity parameters for novel compounds. We have compiled a robust library of over 150,000 chemical compounds with different pharmacological properties from literature and public domain databases. The compounds are stored in simplified molecular-input line-entry system (SMILES), a commonly used text encoding for organic molecules. We utilize an automated process to generate an array of numerical descriptors (features) for each molecule. Redundant and irrelevant descriptors are eliminated iteratively. Our prediction engine is based on a portfolio of machine learning algorithms. We found Random Forest algorithm to be a better choice for this analysis. We captured non-linear relationship in the data and formed a prediction model with reasonable accuracy by averaging across a large number of randomized decision trees. Our next step is to apply deep neural network (DNN) algorithm to predict the biological activity and toxicity properties. We expect the DNN algorithm to give better results and improve the accuracy of the prediction. This presentation will review all these prominent machine learning and deep learning methods, our implementation protocols and discuss these techniques for their usefulness in biomedical and health informatics.

Keywords: deep learning, drug discovery, health informatics, machine learning, toxicity prediction

Procedia PDF Downloads 333
10981 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 203
10980 Machine Learning Techniques to Develop Traffic Accident Frequency Prediction Models

Authors: Rodrigo Aguiar, Adelino Ferreira

Abstract:

Road traffic accidents are the leading cause of unnatural death and injuries worldwide, representing a significant problem of road safety. In this context, the use of artificial intelligence with advanced machine learning techniques has gained prominence as a promising approach to predict traffic accidents. This article investigates the application of machine learning algorithms to develop traffic accident frequency prediction models. Models are evaluated based on performance metrics, making it possible to do a comparative analysis with traditional prediction approaches. The results suggest that machine learning can provide a powerful tool for accident prediction, which will contribute to making more informed decisions regarding road safety.

Keywords: machine learning, artificial intelligence, frequency of accidents, road safety

Procedia PDF Downloads 56
10979 Internet of Things Networks: Denial of Service Detection in Constrained Application Protocol Using Machine Learning Algorithm

Authors: Adamu Abdullahi, On Francisca, Saidu Isah Rambo, G. N. Obunadike, D. T. Chinyio

Abstract:

The paper discusses the potential threat of Denial of Service (DoS) attacks in the Internet of Things (IoT) networks on constrained application protocols (CoAP). As billions of IoT devices are expected to be connected to the internet in the coming years, the security of these devices is vulnerable to attacks, disrupting their functioning. This research aims to tackle this issue by applying mixed methods of qualitative and quantitative for feature selection, extraction, and cluster algorithms to detect DoS attacks in the Constrained Application Protocol (CoAP) using the Machine Learning Algorithm (MLA). The main objective of the research is to enhance the security scheme for CoAP in the IoT environment by analyzing the nature of DoS attacks and identifying a new set of features for detecting them in the IoT network environment. The aim is to demonstrate the effectiveness of the MLA in detecting DoS attacks and compare it with conventional intrusion detection systems for securing the CoAP in the IoT environment. Findings: The research identifies the appropriate node to detect DoS attacks in the IoT network environment and demonstrates how to detect the attacks through the MLA. The accuracy detection in both classification and network simulation environments shows that the k-means algorithm scored the highest percentage in the training and testing of the evaluation. The network simulation platform also achieved the highest percentage of 99.93% in overall accuracy. This work reviews conventional intrusion detection systems for securing the CoAP in the IoT environment. The DoS security issues associated with the CoAP are discussed.

Keywords: algorithm, CoAP, DoS, IoT, machine learning

Procedia PDF Downloads 45
10978 A Study on Performance Prediction in Early Design Stage of Apartment Housing Using Machine Learning

Authors: Seongjun Kim, Sanghoon Shim, Jinwooung Kim, Jaehwan Jung, Sung-Ah Kim

Abstract:

As the development of information and communication technology, the convergence of machine learning of the ICT area and design is attempted. In this way, it is possible to grasp the correlation between various design elements, which was difficult to grasp, and to reflect this in the design result. In architecture, there is an attempt to predict the performance, which is difficult to grasp in the past, by finding the correlation among multiple factors mainly through machine learning. In architectural design area, some attempts to predict the performance affected by various factors have been tried. With machine learning, it is possible to quickly predict performance. The aim of this study is to propose a model that predicts performance according to the block arrangement of apartment housing through machine learning and the design alternative which satisfies the performance such as the daylight hours in the most similar form to the alternative proposed by the designer. Through this study, a designer can proceed with the design considering various design alternatives and accurate performances quickly from the early design stage.

Keywords: apartment housing, machine learning, multi-objective optimization, performance prediction

Procedia PDF Downloads 449
10977 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 133
10976 Predictive Machine Learning Model for Assessing the Impact of Untreated Teeth Grinding on Gingival Recession and Jaw Pain

Authors: Joseph Salim

Abstract:

This paper proposes the development of a supervised machine learning system to predict the consequences of untreated bruxism (teeth grinding) on gingival (gum) recession and jaw pain (most often bilateral jaw pain with possible headaches and limited ability to open the mouth). As a general dentist in a multi-specialty practice, the author has encountered many patients suffering from these issues due to uncontrolled bruxism (teeth grinding) at night. The most effective treatment for managing this problem involves wearing a nightguard during sleep and receiving therapeutic Botox injections to relax the muscles (the masseter muscle) responsible for grinding. However, some patients choose to postpone these treatments, leading to potentially irreversible and costlier consequences in the future. The proposed machine learning model aims to track patients who forgo the recommended treatments and assess the percentage of individuals who will experience worsening jaw pain, gingival (gum) recession, or both within a 3-to-5-year timeframe. By accurately predicting these outcomes, the model seeks to motivate patients to address the root cause proactively, ultimately saving time and pain while improving quality of life and avoiding much costlier treatments such as full-mouth rehabilitation to help recover the loss of vertical dimension of occlusion due to shortened clinical crowns because of bruxism, gingival grafts, etc.

Keywords: artificial intelligence, machine learning, predictive insights, bruxism, teeth grinding, therapeutic botox, nightguard, gingival recession, gum recession, jaw pain

Procedia PDF Downloads 58
10975 Comparing the Detection of Autism Spectrum Disorder within Males and Females Using Machine Learning Techniques

Authors: Joseph Wolff, Jeffrey Eilbott

Abstract:

Autism Spectrum Disorders (ASD) are a spectrum of social disorders characterized by deficits in social communication, verbal ability, and interaction that can vary in severity. In recent years, researchers have used magnetic resonance imaging (MRI) to help detect how neural patterns in individuals with ASD differ from those of neurotypical (NT) controls for classification purposes. This study analyzed the classification of ASD within males and females using functional MRI data. Functional connectivity (FC) correlations among brain regions were used as feature inputs for machine learning algorithms. Analysis was performed on 558 cases from the Autism Brain Imaging Data Exchange (ABIDE) I dataset. When trained specifically on females, the algorithm underperformed in classifying the ASD subset of our testing population. Although the subject size was relatively smaller in the female group, the manual matching of both male and female training groups helps explain the algorithm’s bias, indicating the altered sex abnormalities in functional brain networks compared to typically developing peers. These results highlight the importance of taking sex into account when considering how generalizations of findings on males with ASD apply to females.

Keywords: autism spectrum disorder, machine learning, neuroimaging, sex differences

Procedia PDF Downloads 183
10974 Disease Level Assessment in Wheat Plots Using a Residual Deep Learning Algorithm

Authors: Felipe A. Guth, Shane Ward, Kevin McDonnell

Abstract:

The assessment of disease levels in crop fields is an important and time-consuming task that generally relies on expert knowledge of trained individuals. Image classification in agriculture problems historically has been based on classical machine learning strategies that make use of hand-engineered features in the top of a classification algorithm. This approach tends to not produce results with high accuracy and generalization to the classes classified by the system when the nature of the elements has a significant variability. The advent of deep convolutional neural networks has revolutionized the field of machine learning, especially in computer vision tasks. These networks have great resourcefulness of learning and have been applied successfully to image classification and object detection tasks in the last years. The objective of this work was to propose a new method based on deep learning convolutional neural networks towards the task of disease level monitoring. Common RGB images of winter wheat were obtained during a growing season. Five categories of disease levels presence were produced, in collaboration with agronomists, for the algorithm classification. Disease level tasks performed by experts provided ground truth data for the disease score of the same winter wheat plots were RGB images were acquired. The system had an overall accuracy of 84% on the discrimination of the disease level classes.

Keywords: crop disease assessment, deep learning, precision agriculture, residual neural networks

Procedia PDF Downloads 299
10973 Automated Machine Learning Algorithm Using Recurrent Neural Network to Perform Long-Term Time Series Forecasting

Authors: Ying Su, Morgan C. Wang

Abstract:

Long-term time series forecasting is an important research area for automated machine learning (AutoML). Currently, forecasting based on either machine learning or statistical learning is usually built by experts, and it requires significant manual effort, from model construction, feature engineering, and hyper-parameter tuning to the construction of the time series model. Automation is not possible since there are too many human interventions. To overcome these limitations, this article proposed to use recurrent neural networks (RNN) through the memory state of RNN to perform long-term time series prediction. We have shown that this proposed approach is better than the traditional Autoregressive Integrated Moving Average (ARIMA). In addition, we also found it is better than other network systems, including Fully Connected Neural Networks (FNN), Convolutional Neural Networks (CNN), and Nonpooling Convolutional Neural Networks (NPCNN).

Keywords: automated machines learning, autoregressive integrated moving average, neural networks, time series analysis

Procedia PDF Downloads 52
10972 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods

Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja

Abstract:

In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.

Keywords: alzheimer, machine learning, deep learning, EEG

Procedia PDF Downloads 87
10971 Glucose Monitoring System Using Machine Learning Algorithms

Authors: Sangeeta Palekar, Neeraj Rangwani, Akash Poddar, Jayu Kalambe

Abstract:

The bio-medical analysis is an indispensable procedure for identifying health-related diseases like diabetes. Monitoring the glucose level in our body regularly helps us identify hyperglycemia and hypoglycemia, which can cause severe medical problems like nerve damage or kidney diseases. This paper presents a method for predicting the glucose concentration in blood samples using image processing and machine learning algorithms. The glucose solution is prepared by the glucose oxidase (GOD) and peroxidase (POD) method. An experimental database is generated based on the colorimetric technique. The image of the glucose solution is captured by the raspberry pi camera and analyzed using image processing by extracting the RGB, HSV, LUX color space values. Regression algorithms like multiple linear regression, decision tree, RandomForest, and XGBoost were used to predict the unknown glucose concentration. The multiple linear regression algorithm predicts the results with 97% accuracy. The image processing and machine learning-based approach reduce the hardware complexities of existing platforms.

Keywords: artificial intelligence glucose detection, glucose oxidase, peroxidase, image processing, machine learning

Procedia PDF Downloads 171
10970 Evaluating the Implementation of Machine Learning Techniques in the South African Built Environment

Authors: Peter Adekunle, Clinton Aigbavboa, Matthew Ikuabe, Opeoluwa Akinradewo

Abstract:

The future of machine learning (ML) in building may seem like a distant idea that will take decades to materialize, but it is actually far closer than previously believed. In reality, the built environment has been progressively increasing interest in machine learning. Although it could appear to be a very technical, impersonal approach, it can really make things more personable. Instead of eliminating humans out of the equation, machine learning allows people do their real work more efficiently. It is therefore vital to evaluate the factors influencing the implementation and challenges of implementing machine learning techniques in the South African built environment. The study's design was one of a survey. In South Africa, construction workers and professionals were given a total of one hundred fifty (150) questionnaires, of which one hundred and twenty-four (124) were returned and deemed eligible for study. Utilizing percentage, mean item scores, standard deviation, and Kruskal-Wallis, the collected data was analyzed. The results demonstrate that the top factors influencing the adoption of machine learning are knowledge level and a lack of understanding of its potential benefits. While lack of collaboration among stakeholders and lack of tools and services are the key hurdles to the deployment of machine learning within the South African built environment. The study came to the conclusion that ML adoption should be promoted in order to increase safety, productivity, and service quality within the built environment.

Keywords: machine learning, implementation, built environment, construction stakeholders

Procedia PDF Downloads 104
10969 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research

Procedia PDF Downloads 126
10968 Development of an Automatic Computational Machine Learning Pipeline to Process Confocal Fluorescence Images for Virtual Cell Generation

Authors: Miguel Contreras, David Long, Will Bachman

Abstract:

Background: Microscopy plays a central role in cell and developmental biology. In particular, fluorescence microscopy can be used to visualize specific cellular components and subsequently quantify their morphology through development of virtual-cell models for study of effects of mechanical forces on cells. However, there are challenges with these imaging experiments, which can make it difficult to quantify cell morphology: inconsistent results, time-consuming and potentially costly protocols, and limitation on number of labels due to spectral overlap. To address these challenges, the objective of this project is to develop an automatic computational machine learning pipeline to predict cellular components morphology for virtual-cell generation based on fluorescence cell membrane confocal z-stacks. Methods: Registered confocal z-stacks of nuclei and cell membrane of endothelial cells, consisting of 20 images each, were obtained from fluorescence confocal microscopy and normalized through software pipeline for each image to have a mean pixel intensity value of 0.5. An open source machine learning algorithm, originally developed to predict fluorescence labels on unlabeled transmitted light microscopy cell images, was trained using this set of normalized z-stacks on a single CPU machine. Through transfer learning, the algorithm used knowledge acquired from its previous training sessions to learn the new task. Once trained, the algorithm was used to predict morphology of nuclei using normalized cell membrane fluorescence images as input. Predictions were compared to the ground truth fluorescence nuclei images. Results: After one week of training, using one cell membrane z-stack (20 images) and corresponding nuclei label, results showed qualitatively good predictions on training set. The algorithm was able to accurately predict nuclei locations as well as shape when fed only fluorescence membrane images. Similar training sessions with improved membrane image quality, including clear lining and shape of the membrane, clearly showing the boundaries of each cell, proportionally improved nuclei predictions, reducing errors relative to ground truth. Discussion: These results show the potential of pre-trained machine learning algorithms to predict cell morphology using relatively small amounts of data and training time, eliminating the need of using multiple labels in immunofluorescence experiments. With further training, the algorithm is expected to predict different labels (e.g., focal-adhesion sites, cytoskeleton), which can be added to the automatic machine learning pipeline for direct input into Principal Component Analysis (PCA) for generation of virtual-cell mechanical models.

Keywords: cell morphology prediction, computational machine learning, fluorescence microscopy, virtual-cell models

Procedia PDF Downloads 178
10967 Neural Networks and Genetic Algorithms Approach for Word Correction and Prediction

Authors: Rodrigo S. Fonseca, Antônio C. P. Veiga

Abstract:

Aiming at helping people with some movement limitation that makes typing and communication difficult, there is a need to customize an assistive tool with a learning environment that helps the user in order to optimize text input, identifying the error and providing the correction and possibilities of choice in the Portuguese language. The work presents an Orthographic and Grammatical System that can be incorporated into writing environments, improving and facilitating the use of an alphanumeric keyboard, using a prototype built using a genetic algorithm in addition to carrying out the prediction, which can occur based on the quantity and position of the inserted letters and even placement in the sentence, ensuring the sequence of ideas using a Long Short Term Memory (LSTM) neural network. The prototype optimizes data entry, being a component of assistive technology for the textual formulation, detecting errors, seeking solutions and informing the user of accurate predictions quickly and effectively through machine learning.

Keywords: genetic algorithm, neural networks, word prediction, machine learning

Procedia PDF Downloads 166
10966 Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems

Authors: Bruno Trstenjak, Dzenana Donko

Abstract:

Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.

Keywords: case based reasoning, classification, expert's knowledge, hybrid model

Procedia PDF Downloads 346
10965 A Learning-Based EM Mixture Regression Algorithm

Authors: Yi-Cheng Tian, Miin-Shen Yang

Abstract:

The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.

Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model

Procedia PDF Downloads 483
10964 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 105
10963 Feature-Based Summarizing and Ranking from Customer Reviews

Authors: Dim En Nyaung, Thin Lai Lai Thein

Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Keywords: opinion mining, opinion summarization, sentiment analysis, text mining

Procedia PDF Downloads 309
10962 A System to Detect Inappropriate Messages in Online Social Networks

Authors: Shivani Singh, Shantanu Nakhare, Kalyani Nair, Rohan Shetty

Abstract:

As social networking is growing at a rapid pace today it is vital that we work on improving its management. Research has shown that the content present in online social networks may have significant influence on impressionable minds. If such platforms are misused, it will lead to negative consequences. Detecting insults or inappropriate messages continues to be one of the most challenging aspects of Online Social Networks (OSNs) today. We address this problem through a Machine Learning Based Soft Text Classifier approach using Support Vector Machine algorithm. The proposed system acts as a screening mechanism the alerts the user about such messages. The messages are classified according to their subject matter and each comment is labeled for the presence of profanity and insults.

Keywords: machine learning, online social networks, soft text classifier, support vector machine

Procedia PDF Downloads 478
10961 Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Authors: Akella Amarendra Babu, Rama Devi Yellasiri, Natukula Sainath

Abstract:

Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.

Keywords: pronunciation variations, dynamic programming, machine learning, natural language processing

Procedia PDF Downloads 148
10960 The Role of Optimization and Machine Learning in e-Commerce Logistics in 2030

Authors: Vincenzo Capalbo, Gianpaolo Ghiani, Emanuele Manni

Abstract:

Global e-commerce sales have reached unprecedented levels in the past few years. As this trend is only predicted to go up as we continue into the ’20s, new challenges will be faced by companies when planning and controlling e-commerce logistics. In this paper, we survey the related literature on Optimization and Machine Learning as well as on combined methodologies. We also identify the distinctive features of next-generation planning algorithms - namely scalability, model-and-run features and learning capabilities - that will be fundamental to cope with the scale and complexity of logistics in the next decade.

Keywords: e-commerce, hardware acceleration, logistics, machine learning, mixed integer programming, optimization

Procedia PDF Downloads 208
10959 A Study for Area-level Mosquito Abundance Prediction by Using Supervised Machine Learning Point-level Predictor

Authors: Theoktisti Makridou, Konstantinos Tsaprailis, George Arvanitakis, Charalampos Kontoes

Abstract:

In the literature, the data-driven approaches for mosquito abundance prediction relaying on supervised machine learning models that get trained with historical in-situ measurements. The counterpart of this approach is once the model gets trained on pointlevel (specific x,y coordinates) measurements, the predictions of the model refer again to point-level. These point-level predictions reduce the applicability of those solutions once a lot of early warning and mitigation actions applications need predictions for an area level, such as a municipality, village, etc... In this study, we apply a data-driven predictive model, which relies on public-open satellite Earth Observation and geospatial data and gets trained with historical point-level in-Situ measurements of mosquito abundance. Then we propose a methodology to extract information from a point-level predictive model to a broader area-level prediction. Our methodology relies on the randomly spatial sampling of the area of interest (similar to the Poisson hardcore process), obtaining the EO and geomorphological information for each sample, doing the point-wise prediction for each sample, and aggregating the predictions to represent the average mosquito abundance of the area. We quantify the performance of the transformation from the pointlevel to the area-level predictions, and we analyze it in order to understand which parameters have a positive or negative impact on it. The goal of this study is to propose a methodology that predicts the mosquito abundance of a given area by relying on point-level prediction and to provide qualitative insights regarding the expected performance of the area-level prediction. We applied our methodology to historical data (of Culex pipiens) of two areas of interest (Veneto region of Italy and Central Macedonia of Greece). In both cases, the results were consistent. The mean mosquito abundance of a given area can be estimated with similar accuracy to the point-level predictor, sometimes even better. The density of the samples that we use to represent one area has a positive effect on the performance in contrast to the actual number of sampling points which is not informative at all regarding the performance without the size of the area. Additionally, we saw that the distance between the sampling points and the real in-situ measurements that were used for training did not strongly affect the performance.

Keywords: mosquito abundance, supervised machine learning, culex pipiens, spatial sampling, west nile virus, earth observation data

Procedia PDF Downloads 110
10958 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 56
10957 Using Machine Learning to Monitor the Condition of the Cutting Edge during Milling Hardened Steel

Authors: Pawel Twardowski, Maciej Tabaszewski, Jakub Czyżycki

Abstract:

The main goal of the work was to use machine learning to predict cutting-edge wear. The research was carried out while milling hardened steel with sintered carbide cutters at various cutting speeds. During the tests, cutting-edge wear was measured, and vibration acceleration signals were also measured. Appropriate measures were determined from the vibration signals and served as input data in the machine-learning process. Two approaches were used in this work. The first one involved a two-state classification of the cutting edge - suitable and unfit for further work. In the second approach, prediction of the cutting-edge state based on vibration signals was used. The obtained research results show that the appropriate use of machine learning algorithms gives excellent results related to monitoring cutting edge during the process.

Keywords: milling of hardened steel, tool wear, vibrations, machine learning

Procedia PDF Downloads 20
10956 MLOps Scaling Machine Learning Lifecycle in an Industrial Setting

Authors: Yizhen Zhao, Adam S. Z. Belloum, Goncalo Maia Da Costa, Zhiming Zhao

Abstract:

Machine learning has evolved from an area of academic research to a real-word applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiment. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system (e.g.data versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.

Keywords: cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps

Procedia PDF Downloads 236
10955 Short-Term Load Forecasting Based on Variational Mode Decomposition and Least Square Support Vector Machine

Authors: Jiangyong Liu, Xiangxiang Xu, Bote Luo, Xiaoxue Luo, Jiang Zhu, Lingzhi Yi

Abstract:

To address the problems of non-linearity and high randomness of the original power load sequence causing the degradation of power load forecasting accuracy, a short-term load forecasting method is proposed. The method is based on the Least Square Support Vector Machine optimized by an Improved Sparrow Search Algorithm combined with the Variational Mode Decomposition proposed in this paper. The application of the variational mode decomposition technique decomposes the raw power load data into a series of Intrinsic Mode Functions components, which can reduce the complexity and instability of the raw data while overcoming modal confounding; the proposed improved sparrow search algorithm can solve the problem of difficult selection of learning parameters in the least Square Support Vector Machine. Finally, through comparison experiments, the results show that the method can effectively improve prediction accuracy.

Keywords: load forecasting, variational mode decomposition, improved sparrow search algorithm, least square support vector machine

Procedia PDF Downloads 70