Search results for: unsupervised machine learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8231

Search results for: unsupervised machine learning

8201 Deleterious SNP’s Detection Using Machine Learning

Authors: Hamza Zidoum

Abstract:

This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.

Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM

Procedia PDF Downloads 350
8200 Real-Time Network Anomaly Detection Systems Based on Machine-Learning Algorithms

Authors: Zahra Ramezanpanah, Joachim Carvallo, Aurelien Rodriguez

Abstract:

This paper aims to detect anomalies in streaming data using machine learning algorithms. In this regard, we designed two separate pipelines and evaluated the effectiveness of each separately. The first pipeline, based on supervised machine learning methods, consists of two phases. In the first phase, we trained several supervised models using the UNSW-NB15 data-set. We measured the efficiency of each using different performance metrics and selected the best model for the second phase. At the beginning of the second phase, we first, using Argus Server, sniffed a local area network. Several types of attacks were simulated and then sent the sniffed data to a running algorithm at short intervals. This algorithm can display the results of each packet of received data in real-time using the trained model. The second pipeline presented in this paper is based on unsupervised algorithms, in which a Temporal Graph Network (TGN) is used to monitor a local network. The TGN is trained to predict the probability of future states of the network based on its past behavior. Our contribution in this section is introducing an indicator to identify anomalies from these predicted probabilities.

Keywords: temporal graph network, anomaly detection, cyber security, IDS

Procedia PDF Downloads 77
8199 Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model

Authors: Youngjae Jin, Daeshik Kim

Abstract:

This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in Verilog HDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.

Keywords: auto-encoder, behavior model simulation, digital hardware design, pre-route simulation, Unsupervised feature learning

Procedia PDF Downloads 420
8198 Supervised/Unsupervised Mahalanobis Algorithm for Improving Performance for Cyberattack Detection over Communications Networks

Authors: Radhika Ranjan Roy

Abstract:

Deployment of machine learning (ML)/deep learning (DL) algorithms for cyberattack detection in operational communications networks (wireless and/or wire-line) is being delayed because of low-performance parameters (e.g., recall, precision, and f₁-score). If datasets become imbalanced, which is the usual case for communications networks, the performance tends to become worse. Complexities in handling reducing dimensions of the feature sets for increasing performance are also a huge problem. Mahalanobis algorithms have been widely applied in scientific research because Mahalanobis distance metric learning is a successful framework. In this paper, we have investigated the Mahalanobis binary classifier algorithm for increasing cyberattack detection performance over communications networks as a proof of concept. We have also found that high-dimensional information in intermediate features that are not utilized as much for classification tasks in ML/DL algorithms are the main contributor to the state-of-the-art of improved performance of the Mahalanobis method, even for imbalanced and sparse datasets. With no feature reduction, MD offers uniform results for precision, recall, and f₁-score for unbalanced and sparse NSL-KDD datasets.

Keywords: Mahalanobis distance, machine learning, deep learning, NS-KDD, local intrinsic dimensionality, chi-square, positive semi-definite, area under the curve

Procedia PDF Downloads 52
8197 A Machine Learning Approach for the Leakage Classification in the Hydraulic Final Test

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

The widespread use of machine learning applications in production is significantly accelerated by improved computing power and increasing data availability. Predictive quality enables the assurance of product quality by using machine learning models as a basis for decisions on test results. The use of real Bosch production data based on geometric gauge blocks from machining, mating data from assembly and hydraulic measurement data from final testing of directional valves is a promising approach to classifying the quality characteristics of workpieces.

Keywords: machine learning, classification, predictive quality, hydraulics, supervised learning

Procedia PDF Downloads 166
8196 Heuristic Classification of Hydrophone Recordings

Authors: Daniel M. Wolff, Patricia Gray, Rafael de la Parra Venegas

Abstract:

An unsupervised machine listening system is constructed and applied to a dataset of 17,195 30-second marine hydrophone recordings. The system is then heuristically supplemented with anecdotal listening, contextual recording information, and supervised learning techniques to reduce the number of false positives. Features for classification are assembled by extracting the following data from each of the audio files: the spectral centroid, root-mean-squared values for each frequency band of a 10-octave filter bank, and mel-frequency cepstral coefficients in 5-second frames. In this way both time- and frequency-domain information are contained in the features to be passed to a clustering algorithm. Classification is performed using the k-means algorithm and then a k-nearest neighbors search. Different values of k are experimented with, in addition to different combinations of the available feature sets. Hypothesized class labels are 'primarily anthrophony' and 'primarily biophony', where the best class result conforming to the former label has 104 members after heuristic pruning. This demonstrates how a large audio dataset has been made more tractable with machine learning techniques, forming the foundation of a framework designed to acoustically monitor and gauge biological and anthropogenic activity in a marine environment.

Keywords: anthrophony, hydrophone, k-means, machine learning

Procedia PDF Downloads 136
8195 Empowering a New Frontier in Heart Disease Detection: Unleashing Quantum Machine Learning

Authors: Sadia Nasrin Tisha, Mushfika Sharmin Rahman, Javier Orduz

Abstract:

Machine learning is applied in a variety of fields throughout the world. The healthcare sector has benefited enormously from it. One of the most effective approaches for predicting human heart diseases is to use machine learning applications to classify data and predict the outcome as a classification. However, with the rapid advancement of quantum technology, quantum computing has emerged as a potential game-changer for many applications. Quantum algorithms have the potential to execute substantially faster than their classical equivalents, which can lead to significant improvements in computational performance and efficiency. In this study, we applied quantum machine learning concepts to predict coronary heart diseases from text data. We experimented thrice with three different features; and three feature sets. The data set consisted of 100 data points. We pursue to do a comparative analysis of the two approaches, highlighting the potential benefits of quantum machine learning for predicting heart diseases.

Keywords: quantum machine learning, SVM, QSVM, matrix product state

Procedia PDF Downloads 64
8194 Assessing the Effectiveness of Machine Learning Algorithms for Cyber Threat Intelligence Discovery from the Darknet

Authors: Azene Zenebe

Abstract:

Deep learning is a subset of machine learning which incorporates techniques for the construction of artificial neural networks and found to be useful for modeling complex problems with large dataset. Deep learning requires a very high power computational and longer time for training. By aggregating computing power, high performance computer (HPC) has emerged as an approach to resolving advanced problems and performing data-driven research activities. Cyber threat intelligence (CIT) is actionable information or insight an organization or individual uses to understand the threats that have, will, or are currently targeting the organization. Results of review of literature will be presented along with results of experimental study that compares the performance of tree-based and function-base machine learning including deep learning algorithms using secondary dataset collected from darknet.

Keywords: deep-learning, cyber security, cyber threat modeling, tree-based machine learning, function-based machine learning, data science

Procedia PDF Downloads 123
8193 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 192
8192 Breast Cancer Prediction Using Score-Level Fusion of Machine Learning and Deep Learning Models

Authors: Sam Khozama, Ali M. Mayya

Abstract:

Breast cancer is one of the most common types in women. Early prediction of breast cancer helps physicians detect cancer in its early stages. Big cancer data needs a very powerful tool to analyze and extract predictions. Machine learning and deep learning are two of the most efficient tools for predicting cancer based on textual data. In this study, we developed a fusion model of two machine learning and deep learning models. To obtain the final prediction, Long-Short Term Memory (LSTM) and ensemble learning with hyper parameters optimization are used, and score-level fusion is used. Experiments are done on the Breast Cancer Surveillance Consortium (BCSC) dataset after balancing and grouping the class categories. Five different training scenarios are used, and the tests show that the designed fusion model improved the performance by 3.3% compared to the individual models.

Keywords: machine learning, deep learning, cancer prediction, breast cancer, LSTM, fusion

Procedia PDF Downloads 134
8191 Using Greywolf Optimized Machine Learning Algorithms to Improve Accuracy for Predicting Hospital Readmission for Diabetes

Authors: Vincent Liu

Abstract:

Machine learning algorithms (ML) can achieve high accuracy in predicting outcomes compared to classical models. Metaheuristic, nature-inspired algorithms can enhance traditional ML algorithms by optimizing them such as by performing feature selection. We compare ten ML algorithms to predict 30-day hospital readmission rates for diabetes patients in the US using a dataset from UCI Machine Learning Repository with feature selection performed by Greywolf nature-inspired algorithm. The baseline accuracy for the initial random forest model was 65%. After performing feature engineering, SMOTE for class balancing, and Greywolf optimization, the machine learning algorithms showed better metrics, including F1 scores, accuracy, and confusion matrix with improvements ranging in 10%-30%, and a best model of XGBoost with an accuracy of 95%. Applying machine learning this way can improve patient outcomes as unnecessary rehospitalizations can be prevented by focusing on patients that are at a higher risk of readmission.

Keywords: diabetes, machine learning, 30-day readmission, metaheuristic

Procedia PDF Downloads 25
8190 Machine Learning Techniques to Develop Traffic Accident Frequency Prediction Models

Authors: Rodrigo Aguiar, Adelino Ferreira

Abstract:

Road traffic accidents are the leading cause of unnatural death and injuries worldwide, representing a significant problem of road safety. In this context, the use of artificial intelligence with advanced machine learning techniques has gained prominence as a promising approach to predict traffic accidents. This article investigates the application of machine learning algorithms to develop traffic accident frequency prediction models. Models are evaluated based on performance metrics, making it possible to do a comparative analysis with traditional prediction approaches. The results suggest that machine learning can provide a powerful tool for accident prediction, which will contribute to making more informed decisions regarding road safety.

Keywords: machine learning, artificial intelligence, frequency of accidents, road safety

Procedia PDF Downloads 56
8189 Machine Learning Based Smart Beehive Monitoring System Without Internet

Authors: Esra Ece Var

Abstract:

Beekeeping plays essential role both in terms of agricultural yields and agricultural economy; they produce honey, wax, royal jelly, apitoxin, pollen, and propolis. Nowadays, these natural products become more importantly suitable and preferable for nutrition, food supplement, medicine, and industry. However, to produce organic honey, majority of the apiaries are located in remote or distant rural areas where utilities such as electricity and Internet network are not available. Additionally, due to colony failures, world honey production decreases year by year despite the increase in the number of beehives. The objective of this paper is to develop a smart beehive monitoring system for apiaries including those that do not have access to Internet network. In this context, temperature and humidity inside the beehive, and ambient temperature were measured with RFID sensors. Control center, where all sensor data was sent and stored at, has a GSM module used to warn the beekeeper via SMS when an anomaly is detected. Simultaneously, using the collected data, an unsupervised machine learning algorithm is used for detecting anomalies and calibrating the warning system. The results show that the smart beehive monitoring system can detect fatal anomalies up to 4 weeks prior to colony loss.

Keywords: beekeeping, smart systems, machine learning, anomaly detection, apiculture

Procedia PDF Downloads 203
8188 A Study on Performance Prediction in Early Design Stage of Apartment Housing Using Machine Learning

Authors: Seongjun Kim, Sanghoon Shim, Jinwooung Kim, Jaehwan Jung, Sung-Ah Kim

Abstract:

As the development of information and communication technology, the convergence of machine learning of the ICT area and design is attempted. In this way, it is possible to grasp the correlation between various design elements, which was difficult to grasp, and to reflect this in the design result. In architecture, there is an attempt to predict the performance, which is difficult to grasp in the past, by finding the correlation among multiple factors mainly through machine learning. In architectural design area, some attempts to predict the performance affected by various factors have been tried. With machine learning, it is possible to quickly predict performance. The aim of this study is to propose a model that predicts performance according to the block arrangement of apartment housing through machine learning and the design alternative which satisfies the performance such as the daylight hours in the most similar form to the alternative proposed by the designer. Through this study, a designer can proceed with the design considering various design alternatives and accurate performances quickly from the early design stage.

Keywords: apartment housing, machine learning, multi-objective optimization, performance prediction

Procedia PDF Downloads 449
8187 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 416
8186 Contrastive Learning for Unsupervised Object Segmentation in Sequential Images

Authors: Tian Zhang

Abstract:

Unsupervised object segmentation aims at segmenting objects in sequential images and obtaining the mask of each object without any manual intervention. Unsupervised segmentation remains a challenging task due to the lack of prior knowledge about these objects. Previous methods often require manually specifying the action of each object, which is often difficult to obtain. Instead, this paper does not need action information of objects and automatically learns the actions and relations among objects from the structured environment. To obtain the object segmentation of sequential images, the relationships between objects and images are extracted to infer the action and interaction of objects based on the multi-head attention mechanism. Three types of objects’ relationships in the object segmentation task are proposed: the relationship between objects in the same frame, the relationship between objects in two frames, and the relationship between objects and historical information. Based on these relationships, the proposed model (1) is effective in multiple objects segmentation tasks, (2) just needs images as input, and (3) produces better segmentation results as more relationships are considered. The experimental results on multiple datasets show that this paper’s method achieves state-of-art performance. The quantitative and qualitative analyses of the result are conducted. The proposed method could be easily extended to other similar applications.

Keywords: unsupervised object segmentation, attention mechanism, contrastive learning, structured environment

Procedia PDF Downloads 87
8185 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 133
8184 Advancing the Analysis of Physical Activity Behaviour in Diverse, Rapidly Evolving Populations: Using Unsupervised Machine Learning to Segment and Cluster Accelerometer Data

Authors: Christopher Thornton, Niina Kolehmainen, Kianoush Nazarpour

Abstract:

Background: Accelerometers are widely used to measure physical activity behavior, including in children. The traditional method for processing acceleration data uses cut points, relying on calibration studies that relate the quantity of acceleration to energy expenditure. As these relationships do not generalise across diverse populations, they must be parametrised for each subpopulation, including different age groups, which is costly and makes studies across diverse populations difficult. A data-driven approach that allows physical activity intensity states to emerge from the data under study without relying on parameters derived from external populations offers a new perspective on this problem and potentially improved results. We evaluated the data-driven approach in a diverse population with a range of rapidly evolving physical and mental capabilities, namely very young children (9-38 months old), where this new approach may be particularly appropriate. Methods: We applied an unsupervised machine learning approach (a hidden semi-Markov model - HSMM) to segment and cluster the accelerometer data recorded from 275 children with a diverse range of physical and cognitive abilities. The HSMM was configured to identify a maximum of six physical activity intensity states and the output of the model was the time spent by each child in each of the states. For comparison, we also processed the accelerometer data using published cut points with available thresholds for the population. This provided us with time estimates for each child’s sedentary (SED), light physical activity (LPA), and moderate-to-vigorous physical activity (MVPA). Data on the children’s physical and cognitive abilities were collected using the Paediatric Evaluation of Disability Inventory (PEDI-CAT). Results: The HSMM identified two inactive states (INS, comparable to SED), two lightly active long duration states (LAS, comparable to LPA), and two short-duration high-intensity states (HIS, comparable to MVPA). Overall, the children spent on average 237/392 minutes per day in INS/SED, 211/129 minutes per day in LAS/LPA, and 178/168 minutes in HIS/MVPA. We found that INS overlapped with 53% of SED, LAS overlapped with 37% of LPA and HIS overlapped with 60% of MVPA. We also looked at the correlation between the time spent by a child in either HIS or MVPA and their physical and cognitive abilities. We found that HIS was more strongly correlated with physical mobility (R²HIS =0.5, R²MVPA= 0.28), cognitive ability (R²HIS =0.31, R²MVPA= 0.15), and age (R²HIS =0.15, R²MVPA= 0.09), indicating increased sensitivity to key attributes associated with a child’s mobility. Conclusion: An unsupervised machine learning technique can segment and cluster accelerometer data according to the intensity of movement at a given time. It provides a potentially more sensitive, appropriate, and cost-effective approach to analysing physical activity behavior in diverse populations, compared to the current cut points approach. This, in turn, supports research that is more inclusive across diverse populations.

Keywords: physical activity, machine learning, under 5s, disability, accelerometer

Procedia PDF Downloads 181
8183 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods

Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja

Abstract:

In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.

Keywords: alzheimer, machine learning, deep learning, EEG

Procedia PDF Downloads 87
8182 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 142
8181 Evaluating the Implementation of Machine Learning Techniques in the South African Built Environment

Authors: Peter Adekunle, Clinton Aigbavboa, Matthew Ikuabe, Opeoluwa Akinradewo

Abstract:

The future of machine learning (ML) in building may seem like a distant idea that will take decades to materialize, but it is actually far closer than previously believed. In reality, the built environment has been progressively increasing interest in machine learning. Although it could appear to be a very technical, impersonal approach, it can really make things more personable. Instead of eliminating humans out of the equation, machine learning allows people do their real work more efficiently. It is therefore vital to evaluate the factors influencing the implementation and challenges of implementing machine learning techniques in the South African built environment. The study's design was one of a survey. In South Africa, construction workers and professionals were given a total of one hundred fifty (150) questionnaires, of which one hundred and twenty-four (124) were returned and deemed eligible for study. Utilizing percentage, mean item scores, standard deviation, and Kruskal-Wallis, the collected data was analyzed. The results demonstrate that the top factors influencing the adoption of machine learning are knowledge level and a lack of understanding of its potential benefits. While lack of collaboration among stakeholders and lack of tools and services are the key hurdles to the deployment of machine learning within the South African built environment. The study came to the conclusion that ML adoption should be promoted in order to increase safety, productivity, and service quality within the built environment.

Keywords: machine learning, implementation, built environment, construction stakeholders

Procedia PDF Downloads 104
8180 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 105
8179 The Role of Optimization and Machine Learning in e-Commerce Logistics in 2030

Authors: Vincenzo Capalbo, Gianpaolo Ghiani, Emanuele Manni

Abstract:

Global e-commerce sales have reached unprecedented levels in the past few years. As this trend is only predicted to go up as we continue into the ’20s, new challenges will be faced by companies when planning and controlling e-commerce logistics. In this paper, we survey the related literature on Optimization and Machine Learning as well as on combined methodologies. We also identify the distinctive features of next-generation planning algorithms - namely scalability, model-and-run features and learning capabilities - that will be fundamental to cope with the scale and complexity of logistics in the next decade.

Keywords: e-commerce, hardware acceleration, logistics, machine learning, mixed integer programming, optimization

Procedia PDF Downloads 208
8178 Using Machine Learning to Monitor the Condition of the Cutting Edge during Milling Hardened Steel

Authors: Pawel Twardowski, Maciej Tabaszewski, Jakub Czyżycki

Abstract:

The main goal of the work was to use machine learning to predict cutting-edge wear. The research was carried out while milling hardened steel with sintered carbide cutters at various cutting speeds. During the tests, cutting-edge wear was measured, and vibration acceleration signals were also measured. Appropriate measures were determined from the vibration signals and served as input data in the machine-learning process. Two approaches were used in this work. The first one involved a two-state classification of the cutting edge - suitable and unfit for further work. In the second approach, prediction of the cutting-edge state based on vibration signals was used. The obtained research results show that the appropriate use of machine learning algorithms gives excellent results related to monitoring cutting edge during the process.

Keywords: milling of hardened steel, tool wear, vibrations, machine learning

Procedia PDF Downloads 20
8177 MLOps Scaling Machine Learning Lifecycle in an Industrial Setting

Authors: Yizhen Zhao, Adam S. Z. Belloum, Goncalo Maia Da Costa, Zhiming Zhao

Abstract:

Machine learning has evolved from an area of academic research to a real-word applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiment. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system (e.g.data versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.

Keywords: cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps

Procedia PDF Downloads 236
8176 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 144
8175 Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments

Authors: Talal Alshammari, Nasser Alshammari, Mohamed Sedky, Chris Howard

Abstract:

With the widespread adoption of the Internet-connected devices, and with the prevalence of the Internet of Things (IoT) applications, there is an increased interest in machine learning techniques that can provide useful and interesting services in the smart home domain. The areas that machine learning techniques can help advance are varied and ever-evolving. Classifying smart home inhabitants’ Activities of Daily Living (ADLs), is one prominent example. The ability of machine learning technique to find meaningful spatio-temporal relations of high-dimensional data is an important requirement as well. This paper presents a comparative evaluation of state-of-the-art machine learning techniques to classify ADLs in the smart home domain. Forty-two synthetic datasets and two real-world datasets with multiple inhabitants are used to evaluate and compare the performance of the identified machine learning techniques. Our results show significant performance differences between the evaluated techniques. Such as AdaBoost, Cortical Learning Algorithm (CLA), Decision Trees, Hidden Markov Model (HMM), Multi-layer Perceptron (MLP), Structured Perceptron and Support Vector Machines (SVM). Overall, neural network based techniques have shown superiority over the other tested techniques.

Keywords: activities of daily living, classification, internet of things, machine learning, prediction, smart home

Procedia PDF Downloads 317
8174 Auto Classification of Multiple ECG Arrhythmic Detection via Machine Learning Techniques: A Review

Authors: Ng Liang Shen, Hau Yuan Wen

Abstract:

Arrhythmia analysis of ECG signal plays a major role in diagnosing most of the cardiac diseases. Therefore, a single arrhythmia detection of an electrocardiographic (ECG) record can determine multiple pattern of various algorithms and match accordingly each ECG beats based on Machine Learning supervised learning. These researchers used different features and classification methods to classify different arrhythmia types. A major problem in these studies is the fact that the symptoms of the disease do not show all the time in the ECG record. Hence, a successful diagnosis might require the manual investigation of several hours of ECG records. The point of this paper presents investigations cardiovascular ailment in Electrocardiogram (ECG) Signals for Cardiac Arrhythmia utilizing examination of ECG irregular wave frames via heart beat as correspond arrhythmia which with Machine Learning Pattern Recognition.

Keywords: electrocardiogram, ECG, classification, machine learning, pattern recognition, detection, QRS

Procedia PDF Downloads 342
8173 DeepOmics: Deep Learning for Understanding Genome Functioning and the Underlying Genetic Causes of Disease

Authors: Vishnu Pratap Singh Kirar, Madhuri Saxena

Abstract:

Advancement in sequence data generation technologies is churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. With so much data available, the use of machine learning methods and tools to make novel inferences has become obvious. Machine learning methods have been successfully applied to a lot of disciplines, including computational biology and bioinformatics. Researchers in computational biology are interested to develop novel machine learning frameworks to classify the huge amounts of biological data. In this proposal, it plan to employ novel machine learning approaches to aid the understanding of how apparently innocuous mutations (in intergenic DNA and at synonymous sites) cause diseases. We are also interested in discovering novel functional sites in the genome and mutations in which can affect a phenotype of interest.

Keywords: genome wide association studies (GWAS), next generation sequencing (NGS), deep learning, omics

Procedia PDF Downloads 65
8172 Improved Computational Efficiency of Machine Learning Algorithm Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK

Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick

Abstract:

The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning archetypal that could forecast COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organisation (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data is split into 8:2 ratio for training and testing purposes to forecast future new COVID cases. Support Vector Machines (SVM), Random Forests, and linear regression algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID cases is evaluated. Random Forest outperformed the other two Machine Learning algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n=30. The mean square error obtained for Random Forest is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis Random Forest algorithm can perform more effectively and efficiently in predicting the new COVID cases, which could help the health sector to take relevant control measures for the spread of the virus.

Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest

Procedia PDF Downloads 90