Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7

Machine Learning Algorithms Related Abstracts

7 Yawning Computing Using Bayesian Networks

Authors: Serge Tshibangu, Turgay Celik, Zenzo Ncube

Abstract:

Road crashes kill nearly over a million people every year, and leave millions more injured or permanently disabled. Various annual reports reveal that the percentage of fatal crashes due to fatigue/driver falling asleep comes directly after the percentage of fatal crashes due to intoxicated drivers. This percentage is higher than the combined percentage of fatal crashes due to illegal/Un-Safe U-turn and illegal/Un-Safe reversing. Although a relatively small percentage of police reports on road accidents highlights drowsiness and fatigue, the importance of these factors is greater than we might think, hidden by the undercounting of their events. Some scenarios show that these factors are significant in accidents with killed and injured people. Thus the need for an automatic drivers fatigue detection system in order to considerably reduce the number of accidents owing to fatigue.This research approaches the drivers fatigue detection problem in an innovative way by combining cues collected from both temporal analysis of drivers’ faces and environment. Monotony in driving environment is inter-related with visual symptoms of fatigue on drivers’ faces to achieve fatigue detection. Optical and infrared (IR) sensors are used to analyse the monotony in driving environment and to detect the visual symptoms of fatigue on human face. Internal cues from drivers faces and external cues from environment are combined together using machine learning algorithms to automatically detect fatigue.

Keywords: Intelligent Transportation Systems, Bayesian networks, Machine Learning Algorithms, yawning computing

Procedia PDF Downloads 327
6 Use of Machine Learning Algorithms to Pediatric MR Images for Tumor Classification

Authors: I. Stathopoulos, V. Syrgiamiotis, E. Karavasilis, A. Ploussi, I. Nikas, C. Hatzigiorgi, K. Platoni, E. P. Efstathopoulos

Abstract:

Introduction: Brain and central nervous system (CNS) tumors form the second most common group of cancer in children, accounting for 30% of all childhood cancers. MRI is the key imaging technique used for the visualization and management of pediatric brain tumors. Initial characterization of tumors from MRI scans is usually performed via a radiologist’s visual assessment. However, different brain tumor types do not always demonstrate clear differences in visual appearance. Using only conventional MRI to provide a definite diagnosis could potentially lead to inaccurate results, and so histopathological examination of biopsy samples is currently considered to be the gold standard for obtaining definite diagnoses. Machine learning is defined as the study of computational algorithms that can use, complex or not, mathematical relationships and patterns from empirical and scientific data to make reliable decisions. Concerning the above, machine learning techniques could provide effective and accurate ways to automate and speed up the analysis and diagnosis for medical images. Machine learning applications in radiology are or could potentially be useful in practice for medical image segmentation and registration, computer-aided detection and diagnosis systems for CT, MR or radiography images and functional MR (fMRI) images for brain activity analysis and neurological disease diagnosis. Purpose: The objective of this study is to provide an automated tool, which may assist in the imaging evaluation and classification of brain neoplasms in pediatric patients by determining the glioma type, grade and differentiating between different brain tissue types. Moreover, a future purpose is to present an alternative way of quick and accurate diagnosis in order to save time and resources in the daily medical workflow. Materials and Methods: A cohort, of 80 pediatric patients with a diagnosis of posterior fossa tumor, was used: 20 ependymomas, 20 astrocytomas, 20 medulloblastomas and 20 healthy children. The MR sequences used, for every single patient, were the following: axial T1-weighted (T1), axial T2-weighted (T2), FluidAttenuated Inversion Recovery (FLAIR), axial diffusion weighted images (DWI), axial contrast-enhanced T1-weighted (T1ce). From every sequence only a principal slice was used that manually traced by two expert radiologists. Image acquisition was carried out on a GE HDxt 1.5-T scanner. The images were preprocessed following a number of steps including noise reduction, bias-field correction, thresholding, coregistration of all sequences (T1, T2, T1ce, FLAIR, DWI), skull stripping, and histogram matching. A large number of features for investigation were chosen, which included age, tumor shape characteristics, image intensity characteristics and texture features. After selecting the features for achieving the highest accuracy using the least number of variables, four machine learning classification algorithms were used: k-Nearest Neighbour, Support-Vector Machines, C4.5 Decision Tree and Convolutional Neural Network. The machine learning schemes and the image analysis are implemented in the WEKA platform and MatLab platform respectively. Results-Conclusions: The results and the accuracy of images classification for each type of glioma by the four different algorithms are still on process.

Keywords: Pediatric Oncology, Machine Learning Algorithms, image classification, pediatric MRI

Procedia PDF Downloads 26
5 Hybrid Model: An Integration of Machine Learning with Traditional Scorecards

Authors: Golnush Masghati-Amoli, Paul Chin

Abstract:

Over the past recent years, with the rapid increases in data availability and computing power, Machine Learning (ML) techniques have been called on in a range of different industries for their strong predictive capability. However, the use of Machine Learning in commercial banking has been limited due to a special challenge imposed by numerous regulations that require lenders to be able to explain their analytic models, not only to regulators but often to consumers. In other words, although Machine Leaning techniques enable better prediction with a higher level of accuracy, in comparison with other industries, they are adopted less frequently in commercial banking especially for scoring purposes. This is due to the fact that Machine Learning techniques are often considered as a black box and fail to provide information on why a certain risk score is given to a customer. In order to bridge this gap between the explain-ability and performance of Machine Learning techniques, a Hybrid Model is developed at Dun and Bradstreet that is focused on blending Machine Learning algorithms with traditional approaches such as scorecards. The Hybrid Model maximizes efficiency of traditional scorecards by merging its practical benefits, such as explain-ability and the ability to input domain knowledge, with the deep insights of Machine Learning techniques which can uncover patterns scorecard approaches cannot. First, through development of Machine Learning models, engineered features and latent variables and feature interactions that demonstrate high information value in the prediction of customer risk are identified. Then, these features are employed to introduce observed non-linear relationships between the explanatory and dependent variables into traditional scorecards. Moreover, instead of directly computing the Weight of Evidence (WoE) from good and bad data points, the Hybrid Model tries to match the score distribution generated by a Machine Learning algorithm, which ends up providing an estimate of the WoE for each bin. This capability helps to build powerful scorecards with sparse cases that cannot be achieved with traditional approaches. The proposed Hybrid Model is tested on different portfolios where a significant gap is observed between the performance of traditional scorecards and Machine Learning models. The result of analysis shows that Hybrid Model can improve the performance of traditional scorecards by introducing non-linear relationships between explanatory and target variables from Machine Learning models into traditional scorecards. Also, it is observed that in some scenarios the Hybrid Model can be almost as predictive as the Machine Learning techniques while being as transparent as traditional scorecards. Therefore, it is concluded that, with the use of Hybrid Model, Machine Learning algorithms can be used in the commercial banking industry without being concerned with difficulties in explaining the models for regulatory purposes.

Keywords: Machine Learning Algorithms, scorecard, feature engineering, commercial banking, consumer risk

Procedia PDF Downloads 7
4 Evaluation of Gesture-Based Password: User Behavioral Features Using Machine Learning Algorithms

Authors: Lakshmidevi Sreeramareddy, Komalpreet Kaur, Nane Pothier

Abstract:

Graphical-based passwords have existed for decades. Their major advantage is that they are easier to remember than an alphanumeric password. However, their disadvantage (especially recognition-based passwords) is the smaller password space, making them more vulnerable to brute force attacks. Graphical passwords are also highly susceptible to the shoulder-surfing effect. The gesture-based password method that we developed is a grid-free, template-free method. In this study, we evaluated the gesture-based passwords for usability and vulnerability. The results of the study are significant. We developed a gesture-based password application for data collection. Two modes of data collection were used: Creation mode and Replication mode. In creation mode (Session 1), users were asked to create six different passwords and reenter each password five times. In replication mode, users saw a password image created by some other user for a fixed duration of time. Three different duration timers, such as 5 seconds (Session 2), 10 seconds (Session 3), and 15 seconds (Session 4), were used to mimic the shoulder-surfing attack. After the timer expired, the password image was removed, and users were asked to replicate the password. There were 74, 57, 50, and 44 users participated in Session 1, Session 2, Session 3, and Session 4 respectfully. In this study, the machine learning algorithms have been applied to determine whether the person is a genuine user or an imposter based on the password entered. Five different machine learning algorithms were deployed to compare the performance in user authentication: namely, Decision Trees, Linear Discriminant Analysis, Naive Bayes Classifier, Support Vector Machines (SVMs) with Gaussian Radial Basis Kernel function, and K-Nearest Neighbor. Gesture-based password features vary from one entry to the next. It is difficult to distinguish between a creator and an intruder for authentication. For each password entered by the user, four features were extracted: password score, password length, password speed, and password size. All four features were normalized before being fed to a classifier. Three different classifiers were trained using data from all four sessions. Classifiers A, B, and C were trained and tested using data from the password creation session and the password replication with a timer of 5 seconds, 10 seconds, and 15 seconds, respectively. The classification accuracies for Classifier A using five ML algorithms are 72.5%, 71.3%, 71.9%, 74.4%, and 72.9%, respectively. The classification accuracies for Classifier B using five ML algorithms are 69.7%, 67.9%, 70.2%, 73.8%, and 71.2%, respectively. The classification accuracies for Classifier C using five ML algorithms are 68.1%, 64.9%, 68.4%, 71.5%, and 69.8%, respectively. SVMs with Gaussian Radial Basis Kernel outperform other ML algorithms for gesture-based password authentication. Results confirm that the shorter the duration of the shoulder-surfing attack, the higher the authentication accuracy. In conclusion, behavioral features extracted from the gesture-based passwords lead to less vulnerable user authentication.

Keywords: Authentication, Machine Learning Algorithms, Usability, gesture-based passwords, shoulder-surfing attacks

Procedia PDF Downloads 1
3 Predicting Provider Service Time in Outpatient Clinics Using Artificial Intelligence-Based Models

Authors: Haya Salah, Srinivas Sharan

Abstract:

Healthcare facilities use appointment systems to schedule their appointments and to manage access to their medical services. With the growing demand for outpatient care, it is now imperative to manage physician's time effectively. However, high variation in consultation duration affects the clinical scheduler's ability to estimate the appointment duration and allocate provider time appropriately. Underestimating consultation times can lead to physician's burnout, misdiagnosis, and patient dissatisfaction. On the other hand, appointment durations that are longer than required lead to doctor idle time and fewer patient visits. Therefore, a good estimation of consultation duration has the potential to improve timely access to care, resource utilization, quality of care, and patient satisfaction. Although the literature on factors influencing consultation length abound, little work has done to predict it using based data-driven approaches. Therefore, this study aims to predict consultation duration using supervised machine learning algorithms (ML), which predicts an outcome variable (e.g., consultation) based on potential features that influence the outcome. In particular, ML algorithms learn from a historical dataset without explicitly being programmed and uncover the relationship between the features and outcome variable. A subset of the data used in this study has been obtained from the electronic medical records (EMR) of four different outpatient clinics located in central Pennsylvania, USA. Also, publicly available information on doctor's characteristics such as gender and experience has been extracted from online sources. This research develops three popular ML algorithms (deep learning, random forest, gradient boosting machine) to predict the treatment time required for a patient and conducts a comparative analysis of these algorithms with respect to predictive performance. The findings of this study indicate that ML algorithms have the potential to predict the provider service time with superior accuracy. While the current approach of experience-based appointment duration estimation adopted by the clinic resulted in a mean absolute percentage error of 25.8%, the Deep learning algorithm developed in this study yielded the best performance with a MAPE of 12.24%, followed by gradient boosting machine (13.26%) and random forests (14.71%). Besides, this research also identified the critical variables affecting consultation duration to be patient type (new vs. established), doctor's experience, zip code, appointment day, and doctor's specialty. Moreover, several practical insights are obtained based on the comparative analysis of the ML algorithms. The machine learning approach presented in this study can serve as a decision support tool and could be integrated into the appointment system for effectively managing patient scheduling.

Keywords: Machine Learning Algorithms, Clinical Decision Support System, patient scheduling, prediction models, provider service time

Procedia PDF Downloads 1
2 An Artificial Intelligence Framework to Forecast Air Quality

Authors: Richard Ren

Abstract:

Air pollution is a serious danger to international well-being and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.

Keywords: Artificial Intelligence, Air Pollution, Machine Learning Algorithms, air quality prediction

Procedia PDF Downloads 1
1 Fuzzy-Machine Learning Models for the Prediction of Fire Outbreak: A Comparative Analysis

Authors: Uduak Umoh, Imo Eyoh, Emmauel Nyoho

Abstract:

This paper compares fuzzy-machine learning algorithms such as Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) for the predicting cases of fire outbreak. The paper uses the fire outbreak dataset with three features (Temperature, Smoke, and Flame). The data is pre-processed using Interval Type-2 Fuzzy Logic (IT2FL) algorithm. Min-Max Normalization and Principal Component Analysis (PCA) are used to predict feature labels in the dataset, normalize the dataset, and select relevant features respectively. The output of the pre-processing is a dataset with two principal components (PC1 and PC2). The pre-processed dataset is then used in the training of the aforementioned machine learning models. K-fold (with K=10) cross-validation method is used to evaluate the performance of the models using the matrices – ROC (Receiver Operating Curve), Specificity, and Sensitivity. The model is also tested with 20% of the dataset. The validation result shows KNN is the better model for fire outbreak detection with an ROC value of 0.99878, followed by SVM with an ROC value of 0.99753.

Keywords: Principal Component Analysis, Machine Learning Algorithms, support vector machine, Interval Type-2 Fuzzy Logic, Fire Outbreak, K-Nearest Neighbour

Procedia PDF Downloads 1