Search results for: unsupervised machine learning.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8425

Search results for: unsupervised machine learning.

8245 A Machine Learning Approach for Intelligent Transportation System Management on Urban Roads

Authors: Ashish Dhamaniya, Vineet Jain, Rajesh Chouhan

Abstract:

Traffic management is one of the gigantic issue in most of the urban roads in al-most all metropolitan cities in India. Speed is one of the critical traffic parameters for effective Intelligent Transportation System (ITS) implementation as it decides the arrival rate of vehicles on an intersection which are majorly the point of con-gestions. The study aimed to leverage Machine Learning (ML) models to produce precise predictions of speed on urban roadway links. The research objective was to assess how categorized traffic volume and road width, serving as variables, in-fluence speed prediction. Four tree-based regression models namely: Decision Tree (DT), Random Forest (RF), Extra Tree (ET), and Extreme Gradient Boost (XGB)are employed for this purpose. The models' performances were validated using test data, and the results demonstrate that Random Forest surpasses other machine learning techniques and a conventional utility theory-based model in speed prediction. The study is useful for managing the urban roadway network performance under mixed traffic conditions and effective implementation of ITS.

Keywords: stream speed, urban roads, machine learning, traffic flow

Procedia PDF Downloads 60
8244 Predicting Potential Protein Therapeutic Candidates from the Gut Microbiome

Authors: Prasanna Ramachandran, Kareem Graham, Helena Kiefel, Sunit Jain, Todd DeSantis

Abstract:

Microbes that reside inside the mammalian GI tract, commonly referred to as the gut microbiome, have been shown to have therapeutic effects in animal models of disease. We hypothesize that specific proteins produced by these microbes are responsible for this activity and may be used directly as therapeutics. To speed up the discovery of these key proteins from the big-data metagenomics, we have applied machine learning techniques. Using amino acid sequences of known epitopes and their corresponding binding partners, protein interaction descriptors (PID) were calculated, making a positive interaction set. A negative interaction dataset was calculated using sequences of proteins known not to interact with these same binding partners. Using Random Forest and positive and negative PID, a machine learning model was trained and used to predict interacting versus non-interacting proteins. Furthermore, the continuous variable, cosine similarity in the interaction descriptors was used to rank bacterial therapeutic candidates. Laboratory binding assays were conducted to test the candidates for their potential as therapeutics. Results from binding assays reveal the accuracy of the machine learning prediction and are subsequently used to further improve the model.

Keywords: protein-interactions, machine-learning, metagenomics, microbiome

Procedia PDF Downloads 369
8243 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour

Procedia PDF Downloads 337
8242 Churn Prediction for Savings Bank Customers: A Machine Learning Approach

Authors: Prashant Verma

Abstract:

Commercial banks are facing immense pressure, including financial disintermediation, interest rate volatility and digital ways of finance. Retaining an existing customer is 5 to 25 less expensive than acquiring a new one. This paper explores customer churn prediction, based on various statistical & machine learning models and uses under-sampling, to improve the predictive power of these models. The results show that out of the various machine learning models, Random Forest which predicts the churn with 78% accuracy, has been found to be the most powerful model for the scenario. Customer vintage, customer’s age, average balance, occupation code, population code, average withdrawal amount, and an average number of transactions were found to be the variables with high predictive power for the churn prediction model. The model can be deployed by the commercial banks in order to avoid the customer churn so that they may retain the funds, which are kept by savings bank (SB) customers. The article suggests a customized campaign to be initiated by commercial banks to avoid SB customer churn. Hence, by giving better customer satisfaction and experience, the commercial banks can limit the customer churn and maintain their deposits.

Keywords: savings bank, customer churn, customer retention, random forests, machine learning, under-sampling

Procedia PDF Downloads 133
8241 Prediction of Disability-Adjustment Mental Illness Using Machine

Authors: R. M. Krishna Sureddi, V. Kamakshi Prasad, R. Santosh

Abstract:

Machine learning techniques are applied for the analysis of the impact of mental illness on the burden of disease. It is calculated using the disability-adjusted life year (DALY). One DALY represents the loss of the equivalent of one year of full health. DALYs for a disease or health condition are the sum of years of life lost due to premature mortality (YLLs) and years of healthy life lost due to disability (YLDs) due to prevalent cases of the disease or health condition in a population. The critical analysis is done based on the Data sources, machine learning techniques and feature extraction method. The reviewing is done based on major databases. The extracted data is examined using statistical analysis and machine learning techniques were applied. The prediction of the impact of mental illness on the population using machine learning techniques is an alternative approach to the old traditional strategies, which are time-consuming and may not be reliable. The approach makes it necessary for a comprehensive adoption, innovative algorithms, and an understanding of the limitations and challenges. The obtained prediction is a way of understanding the underlying impact of mental illness on the health of the people and it enables us to get a healthy life expectancy. The growing impact of mental illness and the challenges associated with the detection and treatment of mental disorders make it necessary for us to understand the complete effect of it on the majority of the population.

Keywords: ML, DALY, BD, DL

Procedia PDF Downloads 19
8240 Supervised Machine Learning Approach for Studying the Effect of Different Joint Sets on Stability of Mine Pit Slopes Under the Presence of Different External Factors

Authors: Sudhir Kumar Singh, Debashish Chakravarty

Abstract:

Slope stability analysis is an important aspect in the field of geotechnical engineering. It is also important from safety, and economic point of view as any slope failure leads to loss of valuable lives and damage to property worth millions. This paper aims at mitigating the risk of slope failure by studying the effect of different joint sets on the stability of mine pit slopes under the influence of various external factors, namely degree of saturation, rainfall intensity, and seismic coefficients. Supervised machine learning approach has been utilized for making accurate and reliable predictions regarding the stability of slopes based on the value of Factor of Safety. Numerous cases have been studied for analyzing the stability of slopes using the popular Finite Element Method, and the data thus obtained has been used as training data for the supervised machine learning models. The input data has been trained on different supervised machine learning models, namely Random Forest, Decision Tree, Support vector Machine, and XGBoost. Distinct test data that is not present in training data has been used for measuring the performance and accuracy of different models. Although all models have performed well on the test dataset but Random Forest stands out from others due to its high accuracy of greater than 95%, thus helping us by providing a valuable tool at our disposition which is neither computationally expensive nor time consuming and in good accordance with the numerical analysis result.

Keywords: finite element method, geotechnical engineering, machine learning, slope stability

Procedia PDF Downloads 94
8239 Optimizing E-commerce Retention: A Detailed Study of Machine Learning Techniques for Churn Prediction

Authors: Saurabh Kumar

Abstract:

In the fiercely competitive landscape of e-commerce, understanding and mitigating customer churn has become paramount for sustainable business growth. This paper presents a thorough investigation into the application of machine learning techniques for churn prediction in e-commerce, aiming to provide actionable insights for businesses seeking to enhance customer retention strategies. We conduct a comparative study of various machine learning algorithms, including traditional statistical methods and ensemble techniques, leveraging a rich dataset sourced from Kaggle. Through rigorous evaluation, we assess the predictive performance, interpretability, and scalability of each method, elucidating their respective strengths and limitations in capturing the intricate dynamics of customer churn. We identified the XGBoost classifier to be the best performing. Our findings not only offer practical guidelines for selecting suitable modeling approaches but also contribute to the broader understanding of customer behavior in the e-commerce domain. Ultimately, this research equips businesses with the knowledge and tools necessary to proactively identify and address churn, thereby fostering long-term customer relationships and sustaining competitive advantage.

Keywords: customer churn, e-commerce, machine learning techniques, predictive performance, sustainable business growth

Procedia PDF Downloads 7
8238 Cirrhosis Mortality Prediction as Classification using Frequent Subgraph Mining

Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride

Abstract:

In this work, we use machine learning and novel data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. To the best of our knowledge, this is the first work to apply modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning

Procedia PDF Downloads 130
8237 Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

Authors: Karina Zaccari, Ernesto Cordeiro Marujo

Abstract:

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.

Keywords: machine learning, medical diagnosis, meningitis detection, pediatric research

Procedia PDF Downloads 145
8236 Movie Genre Preference Prediction Using Machine Learning for Customer-Based Information

Authors: Haifeng Wang, Haili Zhang

Abstract:

Most movie recommendation systems have been developed for customers to find items of interest. This work introduces a predictive model usable by small and medium-sized enterprises (SMEs) who are in need of a data-based and analytical approach to stock proper movies for local audiences and retain more customers. We used classification models to extract features from thousands of customers’ demographic, behavioral and social information to predict their movie genre preference. In the implementation, a Gaussian kernel support vector machine (SVM) classification model and a logistic regression model were established to extract features from sample data and their test error-in-sample were compared. Comparison of error-out-sample was also made under different Vapnik–Chervonenkis (VC) dimensions in the machine learning algorithm to find and prevent overfitting. Gaussian kernel SVM prediction model can correctly predict movie genre preferences in 85% of positive cases. The accuracy of the algorithm increased to 93% with a smaller VC dimension and less overfitting. These findings advance our understanding of how to use machine learning approach to predict customers’ preferences with a small data set and design prediction tools for these enterprises.

Keywords: computational social science, movie preference, machine learning, SVM

Procedia PDF Downloads 251
8235 Cardiovascular Disease Prediction Using Machine Learning Approaches

Authors: P. Halder, A. Zaman

Abstract:

It is estimated that heart disease accounts for one in ten deaths worldwide. United States deaths due to heart disease are among the leading causes of death according to the World Health Organization. Cardiovascular diseases (CVDs) account for one in four U.S. deaths, according to the Centers for Disease Control and Prevention (CDC). According to statistics, women are more likely than men to die from heart disease as a result of strokes. A 50% increase in men's mortality was reported by the World Health Organization in 2009. The consequences of cardiovascular disease are severe. The causes of heart disease include diabetes, high blood pressure, high cholesterol, abnormal pulse rates, etc. Machine learning (ML) can be used to make predictions and decisions in the healthcare industry. Thus, scientists have turned to modern technologies like Machine Learning and Data Mining to predict diseases. The disease prediction is based on four algorithms. Compared to other boosts, the Ada boost is much more accurate.

Keywords: heart disease, cardiovascular disease, coronary artery disease, feature selection, random forest, AdaBoost, SVM, decision tree

Procedia PDF Downloads 146
8234 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification

Procedia PDF Downloads 459
8233 Machine Learning in Gravity Models: An Application to International Recycling Trade Flow

Authors: Shan Zhang, Peter Suechting

Abstract:

Predicting trade patterns is critical to decision-making in public and private domains, especially in the current context of trade disputes among major economies. In the past, U.S. recycling has relied heavily on strong demand for recyclable materials overseas. However, starting in 2017, a series of new recycling policies (bans and higher inspection standards) was enacted by multiple countries that were the primary importers of recyclables from the U.S. prior to that point. As the global trade flow of recycling shifts, some new importers, mostly developing countries in South and Southeast Asia, have been overwhelmed by the sheer quantities of scrap materials they have received. As the leading exporter of recyclable materials, the U.S. now has a pressing need to build its recycling industry domestically. With respect to the global trade in scrap materials used for recycling, the interest in this paper is (1) predicting how the export of recyclable materials from the U.S. might vary over time, and (2) predicting how international trade flows for recyclables might change in the future. Focusing on three major recyclable materials with a history of trade, this study uses data-driven and machine learning (ML) algorithms---supervised (shrinkage and tree methods) and unsupervised (neural network method)---to decipher the international trade pattern of recycling. Forecasting the potential trade values of recyclables in the future could help importing countries, to which those materials will shift next, to prepare related trade policies. Such policies can assist policymakers in minimizing negative environmental externalities and in finding the optimal amount of recyclables needed by each country. Such forecasts can also help exporting countries, like the U.S understand the importance of healthy domestic recycling industry. The preliminary result suggests that gravity models---in addition to particular selection macroeconomic predictor variables--are appropriate predictors of the total export value of recyclables. With the inclusion of variables measuring aspects of the political conditions (trade tariffs and bans), predictions show that recyclable materials are shifting from more policy-restricted countries to less policy-restricted countries in international recycling trade. Those countries also tend to have high manufacturing activities as a percentage of their GDP.

Keywords: environmental economics, machine learning, recycling, international trade

Procedia PDF Downloads 161
8232 Improving Machine Learning Translation of Hausa Using Named Entity Recognition

Authors: Aishatu Ibrahim Birma, Aminu Tukur, Abdulkarim Abbass Gora

Abstract:

Machine translation plays a vital role in the Field of Natural Language Processing (NLP), breaking down language barriers and enabling communication across diverse communities. In the context of Hausa, a widely spoken language in West Africa, mainly in Nigeria, effective translation systems are essential for enabling seamless communication and promoting cultural exchange. However, due to the unique linguistic characteristics of Hausa, accurate translation remains a challenging task. The research proposes an approach to improving the machine learning translation of Hausa by integrating Named Entity Recognition (NER) techniques. Named entities, such as person names, locations, organizations, and dates, are critical components of a language's structure and meaning. Incorporating NER into the translation process can enhance the quality and accuracy of translations by preserving the integrity of named entities and also maintaining consistency in translating entities (e.g., proper names), and addressing the cultural references specific to Hausa. The NER will be incorporated into Neural Machine Translation (NMT) for the Hausa to English Translation.

Keywords: machine translation, natural language processing (NLP), named entity recognition (NER), neural machine translation (NMT)

Procedia PDF Downloads 30
8231 Systematic and Meta-Analysis of Navigation in Oral and Maxillofacial Trauma and Impact of Machine Learning and AI in Management

Authors: Shohreh Ghasemi

Abstract:

Introduction: Managing oral and maxillofacial trauma is a multifaceted challenge, as it can have life-threatening consequences and significant functional and aesthetic impact. Navigation techniques have been introduced to improve surgical precision to meet this challenge. A machine learning algorithm was also developed to support clinical decision-making regarding treating oral and maxillofacial trauma. Given these advances, this systematic meta-analysis aims to assess the efficacy of navigational techniques in treating oral and maxillofacial trauma and explore the impact of machine learning on their management. Methods: A detailed and comprehensive analysis of studies published between January 2010 and September 2021 was conducted through a systematic meta-analysis. This included performing a thorough search of Web of Science, Embase, and PubMed databases to identify studies evaluating the efficacy of navigational techniques and the impact of machine learning in managing oral and maxillofacial trauma. Studies that did not meet established entry criteria were excluded. In addition, the overall quality of studies included was evaluated using Cochrane risk of bias tool and the Newcastle-Ottawa scale. Results: Total of 12 studies, including 869 patients with oral and maxillofacial trauma, met the inclusion criteria. An analysis of studies revealed that navigation techniques effectively improve surgical accuracy and minimize the risk of complications. Additionally, machine learning algorithms have proven effective in predicting treatment outcomes and identifying patients at high risk for complications. Conclusion: The introduction of navigational technology has great potential to improve surgical precision in oral and maxillofacial trauma treatment. Furthermore, developing machine learning algorithms offers opportunities to improve clinical decision-making and patient outcomes. Still, further studies are necessary to corroborate these results and establish the optimal use of these technologies in managing oral and maxillofacial trauma

Keywords: trauma, machine learning, navigation, maxillofacial, management

Procedia PDF Downloads 54
8230 Machine Learning Approach for Lateralization of Temporal Lobe Epilepsy

Authors: Samira-Sadat JamaliDinan, Haidar Almohri, Mohammad-Reza Nazem-Zadeh

Abstract:

Lateralization of temporal lobe epilepsy (TLE) is very important for positive surgical outcomes. We propose a machine learning framework to ultimately identify the epileptogenic hemisphere for temporal lobe epilepsy (TLE) cases using magnetoencephalography (MEG) coherence source imaging (CSI) and diffusion tensor imaging (DTI). Unlike most studies that use classification algorithms, we propose an effective clustering approach to distinguish between normal and TLE cases. We apply the famous Minkowski weighted K-Means (MWK-Means) technique as the clustering framework. To overcome the problem of poor initialization of K-Means, we use particle swarm optimization (PSO) to effectively select the initial centroids of clusters prior to applying MWK-Means. We demonstrate that compared to K-means and MWK-means independently, this approach is able to improve the result of a benchmark data set.

Keywords: temporal lobe epilepsy, machine learning, clustering, magnetoencephalography

Procedia PDF Downloads 145
8229 A System to Detect Inappropriate Messages in Online Social Networks

Authors: Shivani Singh, Shantanu Nakhare, Kalyani Nair, Rohan Shetty

Abstract:

As social networking is growing at a rapid pace today it is vital that we work on improving its management. Research has shown that the content present in online social networks may have significant influence on impressionable minds. If such platforms are misused, it will lead to negative consequences. Detecting insults or inappropriate messages continues to be one of the most challenging aspects of Online Social Networks (OSNs) today. We address this problem through a Machine Learning Based Soft Text Classifier approach using Support Vector Machine algorithm. The proposed system acts as a screening mechanism the alerts the user about such messages. The messages are classified according to their subject matter and each comment is labeled for the presence of profanity and insults.

Keywords: machine learning, online social networks, soft text classifier, support vector machine

Procedia PDF Downloads 499
8228 Predicting the Frequencies of Tropical Cyclone-Induced Rainfall Events in the US Using a Machine-Learning Model

Authors: Elham Sharifineyestani, Mohammad Farshchin

Abstract:

Tropical cyclones are one of the most expensive and deadliest natural disasters. They cause heavy rainfall and serious flash flooding that result in billions of dollars of damage and considerable mortality each year in the United States. Prediction of the frequency of tropical cyclone-induced rainfall events can be helpful in emergency planning and flood risk management. In this study, we have developed a machine-learning model to predict the exceedance frequencies of tropical cyclone-induced rainfall events in the United States. Model results show a satisfactory agreement with available observations. To examine the effectiveness of our approach, we also have compared the result of our predictions with the exceedance frequencies predicted using a physics-based rainfall model by Feldmann.

Keywords: flash flooding, tropical cyclones, frequencies, machine learning, risk management

Procedia PDF Downloads 238
8227 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 75
8226 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system.  The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: big data, machine learning, smart city, social cost, transportation network

Procedia PDF Downloads 250
8225 Loan Repayment Prediction Using Machine Learning: Model Development, Django Web Integration and Cloud Deployment

Authors: Seun Mayowa Sunday

Abstract:

Loan prediction is one of the most significant and recognised fields of research in the banking, insurance, and the financial security industries. Some prediction systems on the market include the construction of static software. However, due to the fact that static software only operates with strictly regulated rules, they cannot aid customers beyond these limitations. Application of many machine learning (ML) techniques are required for loan prediction. Four separate machine learning models, random forest (RF), decision tree (DT), k-nearest neighbour (KNN), and logistic regression, are used to create the loan prediction model. Using the anaconda navigator and the required machine learning (ML) libraries, models are created and evaluated using the appropriate measuring metrics. From the finding, the random forest performs with the highest accuracy of 80.17% which was later implemented into the Django framework. For real-time testing, the web application is deployed on the Alibabacloud which is among the top 4 biggest cloud computing provider. Hence, to the best of our knowledge, this research will serve as the first academic paper which combines the model development and the Django framework, with the deployment into the Alibaba cloud computing application.

Keywords: k-nearest neighbor, random forest, logistic regression, decision tree, django, cloud computing, alibaba cloud

Procedia PDF Downloads 122
8224 Breast Cancer Diagnosing Based on Online Sequential Extreme Learning Machine Approach

Authors: Musatafa Abbas Abbood Albadr, Masri Ayob, Sabrina Tiun, Fahad Taha Al-Dhief, Mohammad Kamrul Hasan

Abstract:

Breast Cancer (BC) is considered one of the most frequent reasons of cancer death in women between 40 to 55 ages. The BC is diagnosed by using digital images of the FNA (Fine Needle Aspirate) for both benign and malignant tumors of the breast mass. Therefore, this work proposes the Online Sequential Extreme Learning Machine (OSELM) algorithm for diagnosing BC by using the tumor features of the breast mass. The current work has used the Wisconsin Diagnosis Breast Cancer (WDBC) dataset, which contains 569 samples (i.e., 357 samples for benign class and 212 samples for malignant class). Further, numerous measurements of assessment were used in order to evaluate the proposed OSELM algorithm, such as specificity, precision, F-measure, accuracy, G-mean, MCC, and recall. According to the outcomes of the experiment, the highest performance of the proposed OSELM was accomplished with 97.66% accuracy, 98.39% recall, 95.31% precision, 97.25% specificity, 96.83% F-measure, 95.00% MCC, and 96.84% G-Mean. The proposed OSELM algorithm demonstrates promising results in diagnosing BC. Besides, the performance of the proposed OSELM algorithm was superior to all its comparatives with respect to the rate of classification.

Keywords: breast cancer, machine learning, online sequential extreme learning machine, artificial intelligence

Procedia PDF Downloads 103
8223 Quantum Statistical Machine Learning and Quantum Time Series

Authors: Omar Alzeley, Sergey Utev

Abstract:

Minimizing a constrained multivariate function is the fundamental of Machine learning, and these algorithms are at the core of data mining and data visualization techniques. The decision function that maps input points to output points is based on the result of optimization. This optimization is the central of learning theory. One approach to complex systems where the dynamics of the system is inferred by a statistical analysis of the fluctuations in time of some associated observable is time series analysis. The purpose of this paper is a mathematical transition from the autoregressive model of classical time series to the matrix formalization of quantum theory. Firstly, we have proposed a quantum time series model (QTS). Although Hamiltonian technique becomes an established tool to detect a deterministic chaos, other approaches emerge. The quantum probabilistic technique is used to motivate the construction of our QTS model. The QTS model resembles the quantum dynamic model which was applied to financial data. Secondly, various statistical methods, including machine learning algorithms such as the Kalman filter algorithm, are applied to estimate and analyses the unknown parameters of the model. Finally, simulation techniques such as Markov chain Monte Carlo have been used to support our investigations. The proposed model has been examined by using real and simulated data. We establish the relation between quantum statistical machine and quantum time series via random matrix theory. It is interesting to note that the primary focus of the application of QTS in the field of quantum chaos was to find a model that explain chaotic behaviour. Maybe this model will reveal another insight into quantum chaos.

Keywords: machine learning, simulation techniques, quantum probability, tensor product, time series

Procedia PDF Downloads 464
8222 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 328
8221 A Machine Learning Model for Dynamic Prediction of Chronic Kidney Disease Risk Using Laboratory Data, Non-Laboratory Data, and Metabolic Indices

Authors: Amadou Wurry Jallow, Adama N. S. Bah, Karamo Bah, Shih-Ye Wang, Kuo-Chung Chu, Chien-Yeh Hsu

Abstract:

Chronic kidney disease (CKD) is a major public health challenge with high prevalence, rising incidence, and serious adverse consequences. Developing effective risk prediction models is a cost-effective approach to predicting and preventing complications of chronic kidney disease (CKD). This study aimed to develop an accurate machine learning model that can dynamically identify individuals at risk of CKD using various kinds of diagnostic data, with or without laboratory data, at different follow-up points. Creatinine is a key component used to predict CKD. These models will enable affordable and effective screening for CKD even with incomplete patient data, such as the absence of creatinine testing. This retrospective cohort study included data on 19,429 adults provided by a private research institute and screening laboratory in Taiwan, gathered between 2001 and 2015. Univariate Cox proportional hazard regression analyses were performed to determine the variables with high prognostic values for predicting CKD. We then identified interacting variables and grouped them according to diagnostic data categories. Our models used three types of data gathered at three points in time: non-laboratory, laboratory, and metabolic indices data. Next, we used subgroups of variables within each category to train two machine learning models (Random Forest and XGBoost). Our machine learning models can dynamically discriminate individuals at risk for developing CKD. All the models performed well using all three kinds of data, with or without laboratory data. Using only non-laboratory-based data (such as age, sex, body mass index (BMI), and waist circumference), both models predict chronic kidney disease as accurately as models using laboratory and metabolic indices data. Our machine learning models have demonstrated the use of different categories of diagnostic data for CKD prediction, with or without laboratory data. The machine learning models are simple to use and flexible because they work even with incomplete data and can be applied in any clinical setting, including settings where laboratory data is difficult to obtain.

Keywords: chronic kidney disease, glomerular filtration rate, creatinine, novel metabolic indices, machine learning, risk prediction

Procedia PDF Downloads 98
8220 Development of a Decision-Making Method by Using Machine Learning Algorithms in the Early Stage of School Building Design

Authors: Pegah Eshraghi, Zahra Sadat Zomorodian, Mohammad Tahsildoost

Abstract:

Over the past decade, energy consumption in educational buildings has steadily increased. The purpose of this research is to provide a method to quickly predict the energy consumption of buildings using separate evaluation of zones and decomposing the building to eliminate the complexity of geometry at the early design stage. To produce this framework, machine learning algorithms such as Support vector regression (SVR) and Artificial neural network (ANN) are used to predict energy consumption and thermal comfort metrics in a school as a case. The database consists of more than 55000 samples in three climates of Iran. Cross-validation evaluation and unseen data have been used for validation. In a specific label, cooling energy, it can be said the accuracy of prediction is at least 84% and 89% in SVR and ANN, respectively. The results show that the SVR performed much better than the ANN.

Keywords: early stage of design, energy, thermal comfort, validation, machine learning

Procedia PDF Downloads 80
8219 Use of Machine Learning in Data Quality Assessment

Authors: Bruno Pinto Vieira, Marco Antonio Calijorne Soares, Armando Sérgio de Aguiar Filho

Abstract:

Nowadays, a massive amount of information has been produced by different data sources, including mobile devices and transactional systems. In this scenario, concerns arise on how to maintain or establish data quality, which is now treated as a product to be defined, measured, analyzed, and improved to meet consumers' needs, which is the one who uses these data in decision making and companies strategies. Information that reaches low levels of quality can lead to issues that can consume time and money, such as missed business opportunities, inadequate decisions, and bad risk management actions. The step of selecting, identifying, evaluating, and selecting data sources with significant quality according to the need has become a costly task for users since the sources do not provide information about their quality. Traditional data quality control methods are based on user experience or business rules limiting performance and slowing down the process with less than desirable accuracy. Using advanced machine learning algorithms, it is possible to take advantage of computational resources to overcome challenges and add value to companies and users. In this study, machine learning is applied to data quality analysis on different datasets, seeking to compare the performance of the techniques according to the dimensions of quality assessment. As a result, we could create a ranking of approaches used, besides a system that is able to carry out automatically, data quality assessment.

Keywords: machine learning, data quality, quality dimension, quality assessment

Procedia PDF Downloads 141
8218 Performance Analysis of Traffic Classification with Machine Learning

Authors: Htay Htay Yi, Zin May Aye

Abstract:

Network security is role of the ICT environment because malicious users are continually growing that realm of education, business, and then related with ICT. The network security contravention is typically described and examined centrally based on a security event management system. The firewalls, Intrusion Detection System (IDS), and Intrusion Prevention System are becoming essential to monitor or prevent of potential violations, incidents attack, and imminent threats. In this system, the firewall rules are set only for where the system policies are needed. Dataset deployed in this system are derived from the testbed environment. The traffic as in DoS and PortScan traffics are applied in the testbed with firewall and IDS implementation. The network traffics are classified as normal or attacks in the existing testbed environment based on six machine learning classification methods applied in the system. It is required to be tested to get datasets and applied for DoS and PortScan. The dataset is based on CICIDS2017 and some features have been added. This system tested 26 features from the applied dataset. The system is to reduce false positive rates and to improve accuracy in the implemented testbed design. The system also proves good performance by selecting important features and comparing existing a dataset by machine learning classifiers.

Keywords: false negative rate, intrusion detection system, machine learning methods, performance

Procedia PDF Downloads 114
8217 Machine Learning Approach for Anomaly Detection in the Simulated Iec-60870-5-104 Traffic

Authors: Stepan Grebeniuk, Ersi Hodo, Henri Ruotsalainen, Paul Tavolato

Abstract:

Substation security plays an important role in the power delivery system. During the past years, there has been an increase in number of attacks on automation networks of the substations. In spite of that, there hasn’t been enough focus dedicated to the protection of such networks. Aiming to design a specialized anomaly detection system based on machine learning, in this paper we will discuss the IEC 60870-5-104 protocol that is used for communication between substation and control station and focus on the simulation of the substation traffic. Firstly, we will simulate the communication between substation slave and server. Secondly, we will compare the system's normal behavior and its behavior under the attack, in order to extract the right features which will be needed for building an anomaly detection system. Lastly, based on the features we will suggest the anomaly detection system for the asynchronous protocol IEC 60870-5-104.

Keywords: Anomaly detection, IEC-60870-5-104, Machine learning, Man-in-the-Middle attacks, Substation security

Procedia PDF Downloads 356
8216 Managing Data from One Hundred Thousand Internet of Things Devices Globally for Mining Insights

Authors: Julian Wise

Abstract:

Newcrest Mining is one of the world’s top five gold and rare earth mining organizations by production, reserves and market capitalization in the world. This paper elaborates on the data acquisition processes employed by Newcrest in collaboration with Fortune 500 listed organization, Insight Enterprises, to standardize machine learning solutions which process data from over a hundred thousand distributed Internet of Things (IoT) devices located at mine sites globally. Through the utilization of software architecture cloud technologies and edge computing, the technological developments enable for standardized processes of machine learning applications to influence the strategic optimization of mineral processing. Target objectives of the machine learning optimizations include time savings on mineral processing, production efficiencies, risk identification, and increased production throughput. The data acquired and utilized for predictive modelling is processed through edge computing by resources collectively stored within a data lake. Being involved in the digital transformation has necessitated the standardization software architecture to manage the machine learning models submitted by vendors, to ensure effective automation and continuous improvements to the mineral process models. Operating at scale, the system processes hundreds of gigabytes of data per day from distributed mine sites across the globe, for the purposes of increased improved worker safety, and production efficiency through big data applications.

Keywords: mineral technology, big data, machine learning operations, data lake

Procedia PDF Downloads 104