Search results for: central machine learning
10897 Machine Learning Invariants to Detect Anomalies in Secure Water Treatment
Authors: Jonathan Heng, Yoong Cheah Huei
Abstract:
A strategic model that does not trigger any false alarms to detect anomalies in Secure Water Treatment (SWaT) test bed is presented. This model uses machine learning invariants formulated from streamlining the general form of Auto-Regressive models with eXogenous input. A creative generalized CUSUM algorithm to integrate the invariants and the detection strategy technique is successfully developed and tested in the SWaT Programmable Logic Controllers (PLCs). Three steps to fine-tune parameters, b and τ in the generalized algorithm are stated and an example used to demonstrate the tuning process is discussed. This approach can swiftly and effectively detect various scopes of cyber-attacks such as multiple points single stage and multiple points multiple stages in SWaT. This technique can be applied in water treatment plants and other cyber physical systems like power and gas plants too.Keywords: machine learning invariants, generalized CUSUM algorithm with invariants and detection strategy, scope of cyber attacks, strategic model, tuning parameters
Procedia PDF Downloads 18110896 Modelling the Impact of Installation of Heat Cost Allocators in District Heating Systems Using Machine Learning
Authors: Danica Maljkovic, Igor Balen, Bojana Dalbelo Basic
Abstract:
Following the regulation of EU Directive on Energy Efficiency, specifically Article 9, individual metering in district heating systems has to be introduced by the end of 2016. These directions have been implemented in member state’s legal framework, Croatia is one of these states. The directive allows installation of both heat metering devices and heat cost allocators. Mainly due to bad communication and PR, the general public false image was created that the heat cost allocators are devices that save energy. Although this notion is wrong, the aim of this work is to develop a model that would precisely express the influence of installation heat cost allocators on potential energy savings in each unit within multifamily buildings. At the same time, in recent years, a science of machine learning has gain larger application in various fields, as it is proven to give good results in cases where large amounts of data are to be processed with an aim to recognize a pattern and correlation of each of the relevant parameter as well as in the cases where the problem is too complex for a human intelligence to solve. A special method of machine learning, decision tree method, has proven an accuracy of over 92% in prediction general building consumption. In this paper, a machine learning algorithms will be used to isolate the sole impact of installation of heat cost allocators on a single building in multifamily houses connected to district heating systems. Special emphasises will be given regression analysis, logistic regression, support vector machines, decision trees and random forest method.Keywords: district heating, heat cost allocator, energy efficiency, machine learning, decision tree model, regression analysis, logistic regression, support vector machines, decision trees and random forest method
Procedia PDF Downloads 24910895 Random Access in IoT Using Naïve Bayes Classification
Authors: Alhusein Almahjoub, Dongyu Qiu
Abstract:
This paper deals with the random access procedure in next-generation networks and presents the solution to reduce total service time (TST) which is one of the most important performance metrics in current and future internet of things (IoT) based networks. The proposed solution focuses on the calculation of optimal transmission probability which maximizes the success probability and reduces TST. It uses the information of several idle preambles in every time slot, and based on it, it estimates the number of backlogged IoT devices using Naïve Bayes estimation which is a type of supervised learning in the machine learning domain. The estimation of backlogged devices is necessary since optimal transmission probability depends on it and the eNodeB does not have information about it. The simulations are carried out in MATLAB which verify that the proposed solution gives excellent performance.Keywords: random access, LTE/LTE-A, 5G, machine learning, Naïve Bayes estimation
Procedia PDF Downloads 14510894 Optimizing the Scanning Time with Radiation Prediction Using a Machine Learning Technique
Authors: Saeed Eskandari, Seyed Rasoul Mehdikhani
Abstract:
Radiation sources have been used in many industries, such as gamma sources in medical imaging. These waves have destructive effects on humans and the environment. It is very important to detect and find the source of these waves because these sources cannot be seen by the eye. A portable robot has been designed and built with the purpose of revealing radiation sources that are able to scan the place from 5 to 20 meters away and shows the location of the sources according to the intensity of the waves on a two-dimensional digital image. The operation of the robot is done by measuring the pixels separately. By increasing the image measurement resolution, we will have a more accurate scan of the environment, and more points will be detected. But this causes a lot of time to be spent on scanning. In this paper, to overcome this challenge, we designed a method that can optimize this time. In this method, a small number of important points of the environment are measured. Hence the remaining pixels are predicted and estimated by regression algorithms in machine learning. The research method is based on comparing the actual values of all pixels. These steps have been repeated with several other radiation sources. The obtained results of the study show that the values estimated by the regression method are very close to the real values.Keywords: regression, machine learning, scan radiation, robot
Procedia PDF Downloads 7910893 Early Prediction of Disposable Addresses in Ethereum Blockchain
Authors: Ahmad Saleem
Abstract:
Ethereum is the second largest crypto currency in blockchain ecosystem. Along with standard transactions, it supports smart contracts and NFT’s. Current research trends are focused on analyzing the overall structure of the network its growth and behavior. Ethereum addresses are anonymous and can be created on fly. The nature of Ethereum network and addresses make it hard to predict their behavior. The activity period of an ethereum address is not much analyzed. Using machine learning we can make early prediction about the disposability of the address. In this paper we analyzed the lifetime of the addresses. We also identified and predicted the disposable addresses using machine learning models and compared the results.Keywords: blockchain, Ethereum, cryptocurrency, prediction
Procedia PDF Downloads 9710892 Learning to Translate by Learning to Communicate to an Entailment Classifier
Authors: Szymon Rutkowski, Tomasz Korbak
Abstract:
We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning
Procedia PDF Downloads 12810891 The Role of Synthetic Data in Aerial Object Detection
Authors: Ava Dodd, Jonathan Adams
Abstract:
The purpose of this study is to explore the characteristics of developing a machine learning application using synthetic data. The study is structured to develop the application for the purpose of deploying the computer vision model. The findings discuss the realities of attempting to develop a computer vision model for practical purpose, and detail the processes, tools, and techniques that were used to meet accuracy requirements. The research reveals that synthetic data represents another variable that can be adjusted to improve the performance of a computer vision model. Further, a suite of tools and tuning recommendations are provided.Keywords: computer vision, machine learning, synthetic data, YOLOv4
Procedia PDF Downloads 22510890 Talent-to-Vec: Using Network Graphs to Validate Models with Data Sparsity
Authors: Shaan Khosla, Jon Krohn
Abstract:
In a recruiting context, machine learning models are valuable for recommendations: to predict the best candidates for a vacancy, to match the best vacancies for a candidate, and compile a set of similar candidates for any given candidate. While useful to create these models, validating their accuracy in a recommendation context is difficult due to a sparsity of data. In this report, we use network graph data to generate useful representations for candidates and vacancies. We use candidates and vacancies as network nodes and designate a bi-directional link between them based on the candidate interviewing for the vacancy. After using node2vec, the embeddings are used to construct a validation dataset with a ranked order, which will help validate new recommender systems.Keywords: AI, machine learning, NLP, recruiting
Procedia PDF Downloads 8410889 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models
Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti
Abstract:
In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics
Procedia PDF Downloads 5310888 Stackelberg Security Game for Optimizing Security of Federated Internet of Things Platform Instances
Authors: Violeta Damjanovic-Behrendt
Abstract:
This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.Keywords: security, internet of things, cloud computing, stackelberg game, machine learning, naive q-learning
Procedia PDF Downloads 35410887 A Machine Learning Based Framework for Education Levelling in Multicultural Countries: UAE as a Case Study
Authors: Shatha Ghareeb, Rawaa Al-Jumeily, Thar Baker
Abstract:
In Abu Dhabi, there are many different education curriculums where sector of private schools and quality assurance is supervising many private schools in Abu Dhabi for many nationalities. As there are many different education curriculums in Abu Dhabi to meet expats’ needs, there are different requirements for registration and success. In addition, there are different age groups for starting education in each curriculum. In fact, each curriculum has a different number of years, assessment techniques, reassessment rules, and exam boards. Currently, students that transfer curriculums are not being placed in the right year group due to different start and end dates of each academic year and their date of birth for each year group is different for each curriculum and as a result, we find students that are either younger or older for that year group which therefore creates gaps in their learning and performance. In addition, there is not a way of storing student data throughout their academic journey so that schools can track the student learning process. In this paper, we propose to develop a computational framework applicable in multicultural countries such as UAE in which multi-education systems are implemented. The ultimate goal is to use cloud and fog computing technology integrated with Artificial Intelligence techniques of Machine Learning to aid in a smooth transition when assigning students to their year groups, and provide leveling and differentiation information of students who relocate from a particular education curriculum to another, whilst also having the ability to store and access student data from anywhere throughout their academic journey.Keywords: admissions, algorithms, cloud computing, differentiation, fog computing, levelling, machine learning
Procedia PDF Downloads 14210886 Using Machine-Learning Methods for Allergen Amino Acid Sequence's Permutations
Authors: Kuei-Ling Sun, Emily Chia-Yu Su
Abstract:
Allergy is a hypersensitive overreaction of the immune system to environmental stimuli, and a major health problem. These overreactions include rashes, sneezing, fever, food allergies, anaphylaxis, asthmatic, shock, or other abnormal conditions. Allergies can be caused by food, insect stings, pollen, animal wool, and other allergens. Their development of allergies is due to both genetic and environmental factors. Allergies involve immunoglobulin E antibodies, a part of the body’s immune system. Immunoglobulin E antibodies will bind to an allergen and then transfer to a receptor on mast cells or basophils triggering the release of inflammatory chemicals such as histamine. Based on the increasingly serious problem of environmental change, changes in lifestyle, air pollution problem, and other factors, in this study, we both collect allergens and non-allergens from several databases and use several machine learning methods for classification, including logistic regression (LR), stepwise regression, decision tree (DT) and neural networks (NN) to do the model comparison and determine the permutations of allergen amino acid’s sequence.Keywords: allergy, classification, decision tree, logistic regression, machine learning
Procedia PDF Downloads 30310885 Data-Driven Decision Making: A Reference Model for Organizational, Educational and Competency-Based Learning Systems
Authors: Emanuel Koseos
Abstract:
Data-Driven Decision Making (DDDM) refers to making decisions that are based on historical data in order to inform practice, develop strategies and implement policies that benefit organizational settings. In educational technology, DDDM facilitates the implementation of differential educational learning approaches such as Educational Data Mining (EDM) and Competency-Based Education (CBE), which commonly target university classrooms. There is a current need for DDDM models applied to middle and secondary schools from a concern for assessing the needs, progress and performance of students and educators with respect to regional standards, policies and evolution of curriculums. To address these concerns, we propose a DDDM reference model developed using educational key process initiatives as inputs to a machine learning framework implemented with statistical software (SAS, R) to provide a best-practices, complex-free and automated approach for educators at their regional level. We assessed the efficiency of the model over a six-year period using data from 45 schools and grades K-12 in the Langley, BC, Canada regional school district. We concluded that the model has wider appeal, such as business learning systems.Keywords: competency-based learning, data-driven decision making, machine learning, secondary schools
Procedia PDF Downloads 17310884 An Assessment of Floodplain Vegetation Response to Groundwater Changes Using the Soil & Water Assessment Tool Hydrological Model, Geographic Information System, and Machine Learning in the Southeast Australian River Basin
Authors: Newton Muhury, Armando A. Apan, Tek N. Marasani, Gebiaw T. Ayele
Abstract:
The changing climate has degraded freshwater availability in Australia that influencing vegetation growth to a great extent. This study assessed the vegetation responses to groundwater using Terra’s moderate resolution imaging spectroradiometer (MODIS), Normalised Difference Vegetation Index (NDVI), and soil water content (SWC). A hydrological model, SWAT, has been set up in a southeast Australian river catchment for groundwater analysis. The model was calibrated and validated against monthly streamflow from 2001 to 2006 and 2007 to 2010, respectively. The SWAT simulated soil water content for 43 sub-basins and monthly MODIS NDVI data for three different types of vegetation (forest, shrub, and grass) were applied in the machine learning tool, Waikato Environment for Knowledge Analysis (WEKA), using two supervised machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF). The assessment shows that different types of vegetation response and soil water content vary in the dry and wet seasons. The WEKA model generated high positive relationships (r = 0.76, 0.73, and 0.81) between NDVI values of all vegetation in the sub-basins against soil water content (SWC), the groundwater flow (GW), and the combination of these two variables, respectively, during the dry season. However, these responses were reduced by 36.8% (r = 0.48) and 13.6% (r = 0.63) against GW and SWC, respectively, in the wet season. Although the rainfall pattern is highly variable in the study area, the summer rainfall is very effective for the growth of the grass vegetation type. This study has enriched our knowledge of vegetation responses to groundwater in each season, which will facilitate better floodplain vegetation management.Keywords: ArcSWAT, machine learning, floodplain vegetation, MODIS NDVI, groundwater
Procedia PDF Downloads 10110883 Application of Machine Learning Models to Predict Couchsurfers on Free Homestay Platform Couchsurfing
Authors: Yuanxiang Miao
Abstract:
Couchsurfing is a free homestay and social networking service accessible via the website and mobile app. Couchsurfers can directly request free accommodations from others and receive offers from each other. However, it is typically difficult for people to make a decision that accepts or declines a request when they receive it from Couchsurfers because they do not know each other at all. People are expected to meet up with some Couchsurfers who are kind, generous, and interesting while it is unavoidable to meet up with someone unfriendly. This paper utilized classification algorithms of Machine Learning to help people to find out the Good Couchsurfers and Not Good Couchsurfers on the Couchsurfing website. By knowing the prior experience, like Couchsurfer’s profiles, the latest references, and other factors, it became possible to recognize what kind of the Couchsurfers, and furthermore, it helps people to make a decision that whether to host the Couchsurfers or not. The value of this research lies in a case study in Kyoto, Japan in where the author has hosted 54 Couchsurfers, and the author collected relevant data from the 54 Couchsurfers, finally build a model based on classification algorithms for people to predict Couchsurfers. Lastly, the author offered some feasible suggestions for future research.Keywords: Couchsurfing, Couchsurfers prediction, classification algorithm, hospitality tourism platform, hospitality sciences, machine learning
Procedia PDF Downloads 13110882 Machine Learning Assisted Performance Optimization in Memory Tiering
Authors: Derssie Mebratu
Abstract:
As a large variety of micro services, web services, social graphic applications, and media applications are continuously developed, it is substantially vital to design and build a reliable, efficient, and faster memory tiering system. Despite limited design, implementation, and deployment in the last few years, several techniques are currently developed to improve a memory tiering system in a cloud. Some of these techniques are to develop an optimal scanning frequency; improve and track pages movement; identify pages that recently accessed; store pages across each tiering, and then identify pages as a hot, warm, and cold so that hot pages can store in the first tiering Dynamic Random Access Memory (DRAM) and warm pages store in the second tiering Compute Express Link(CXL) and cold pages store in the third tiering Non-Volatile Memory (NVM). Apart from the current proposal and implementation, we also develop a new technique based on a machine learning algorithm in that the throughput produced 25% improved performance compared to the performance produced by the baseline as well as the latency produced 95% improved performance compared to the performance produced by the baseline.Keywords: machine learning, bayesian optimization, memory tiering, CXL, DRAM
Procedia PDF Downloads 9610881 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm
Authors: Ioannis P. Panapakidis, Marios N. Moschakis
Abstract:
With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality
Procedia PDF Downloads 16410880 Critical Evaluation of Groundwater Monitoring Networks for Machine Learning Applications
Authors: Pedro Martinez-Santos, Víctor Gómez-Escalonilla, Silvia Díaz-Alcaide, Esperanza Montero, Miguel Martín-Loeches
Abstract:
Groundwater monitoring networks are critical in evaluating the vulnerability of groundwater resources to depletion and contamination, both in space and time. Groundwater monitoring networks typically grow over decades, often in organic fashion, with relatively little overall planning. The groundwater monitoring networks in the Madrid area, Spain, were reviewed for the purpose of identifying gaps and opportunities for improvement. Spatial analysis reveals the presence of various monitoring networks belonging to different institutions, with several hundred observation wells in an area of approximately 4000 km2. This represents several thousand individual data entries, some going back to the early 1970s. Major issues included overlap between the networks, unknown screen depth/vertical distribution for many observation boreholes, uneven time series, uneven monitored species, and potentially suboptimal locations. Results also reveal there is sufficient information to carry out a spatial and temporal analysis of groundwater vulnerability based on machine learning applications. These can contribute to improve the overall planning of monitoring networks’ expansion into the future.Keywords: groundwater monitoring, observation networks, machine learning, madrid
Procedia PDF Downloads 7810879 Real-Time Network Anomaly Detection Systems Based on Machine-Learning Algorithms
Authors: Zahra Ramezanpanah, Joachim Carvallo, Aurelien Rodriguez
Abstract:
This paper aims to detect anomalies in streaming data using machine learning algorithms. In this regard, we designed two separate pipelines and evaluated the effectiveness of each separately. The first pipeline, based on supervised machine learning methods, consists of two phases. In the first phase, we trained several supervised models using the UNSW-NB15 data-set. We measured the efficiency of each using different performance metrics and selected the best model for the second phase. At the beginning of the second phase, we first, using Argus Server, sniffed a local area network. Several types of attacks were simulated and then sent the sniffed data to a running algorithm at short intervals. This algorithm can display the results of each packet of received data in real-time using the trained model. The second pipeline presented in this paper is based on unsupervised algorithms, in which a Temporal Graph Network (TGN) is used to monitor a local network. The TGN is trained to predict the probability of future states of the network based on its past behavior. Our contribution in this section is introducing an indicator to identify anomalies from these predicted probabilities.Keywords: temporal graph network, anomaly detection, cyber security, IDS
Procedia PDF Downloads 10310878 Predicting Provider Service Time in Outpatient Clinics Using Artificial Intelligence-Based Models
Authors: Haya Salah, Srinivas Sharan
Abstract:
Healthcare facilities use appointment systems to schedule their appointments and to manage access to their medical services. With the growing demand for outpatient care, it is now imperative to manage physician's time effectively. However, high variation in consultation duration affects the clinical scheduler's ability to estimate the appointment duration and allocate provider time appropriately. Underestimating consultation times can lead to physician's burnout, misdiagnosis, and patient dissatisfaction. On the other hand, appointment durations that are longer than required lead to doctor idle time and fewer patient visits. Therefore, a good estimation of consultation duration has the potential to improve timely access to care, resource utilization, quality of care, and patient satisfaction. Although the literature on factors influencing consultation length abound, little work has done to predict it using based data-driven approaches. Therefore, this study aims to predict consultation duration using supervised machine learning algorithms (ML), which predicts an outcome variable (e.g., consultation) based on potential features that influence the outcome. In particular, ML algorithms learn from a historical dataset without explicitly being programmed and uncover the relationship between the features and outcome variable. A subset of the data used in this study has been obtained from the electronic medical records (EMR) of four different outpatient clinics located in central Pennsylvania, USA. Also, publicly available information on doctor's characteristics such as gender and experience has been extracted from online sources. This research develops three popular ML algorithms (deep learning, random forest, gradient boosting machine) to predict the treatment time required for a patient and conducts a comparative analysis of these algorithms with respect to predictive performance. The findings of this study indicate that ML algorithms have the potential to predict the provider service time with superior accuracy. While the current approach of experience-based appointment duration estimation adopted by the clinic resulted in a mean absolute percentage error of 25.8%, the Deep learning algorithm developed in this study yielded the best performance with a MAPE of 12.24%, followed by gradient boosting machine (13.26%) and random forests (14.71%). Besides, this research also identified the critical variables affecting consultation duration to be patient type (new vs. established), doctor's experience, zip code, appointment day, and doctor's specialty. Moreover, several practical insights are obtained based on the comparative analysis of the ML algorithms. The machine learning approach presented in this study can serve as a decision support tool and could be integrated into the appointment system for effectively managing patient scheduling.Keywords: clinical decision support system, machine learning algorithms, patient scheduling, prediction models, provider service time
Procedia PDF Downloads 12110877 Big Data in Telecom Industry: Effective Predictive Techniques on Call Detail Records
Authors: Sara ElElimy, Samir Moustafa
Abstract:
Mobile network operators start to face many challenges in the digital era, especially with high demands from customers. Since mobile network operators are considered a source of big data, traditional techniques are not effective with new era of big data, Internet of things (IoT) and 5G; as a result, handling effectively different big datasets becomes a vital task for operators with the continuous growth of data and moving from long term evolution (LTE) to 5G. So, there is an urgent need for effective Big data analytics to predict future demands, traffic, and network performance to full fill the requirements of the fifth generation of mobile network technology. In this paper, we introduce data science techniques using machine learning and deep learning algorithms: the autoregressive integrated moving average (ARIMA), Bayesian-based curve fitting, and recurrent neural network (RNN) are employed for a data-driven application to mobile network operators. The main framework included in models are identification parameters of each model, estimation, prediction, and final data-driven application of this prediction from business and network performance applications. These models are applied to Telecom Italia Big Data challenge call detail records (CDRs) datasets. The performance of these models is found out using a specific well-known evaluation criteria shows that ARIMA (machine learning-based model) is more accurate as a predictive model in such a dataset than the RNN (deep learning model).Keywords: big data analytics, machine learning, CDRs, 5G
Procedia PDF Downloads 13910876 Determination of Klebsiella Pneumoniae Susceptibility to Antibiotics Using Infrared Spectroscopy and Machine Learning Algorithms
Authors: Manal Suleiman, George Abu-Aqil, Uraib Sharaha, Klaris Riesenberg, Itshak Lapidot, Ahmad Salman, Mahmoud Huleihel
Abstract:
Klebsiella pneumoniae is one of the most aggressive multidrug-resistant bacteria associated with human infections resulting in high mortality and morbidity. Thus, for an effective treatment, it is important to diagnose both the species of infecting bacteria and their susceptibility to antibiotics. Current used methods for diagnosing the bacterial susceptibility to antibiotics are time-consuming (about 24h following the first culture). Thus, there is a clear need for rapid methods to determine the bacterial susceptibility to antibiotics. Infrared spectroscopy is a well-known method that is known as sensitive and simple which is able to detect minor biomolecular changes in biological samples associated with developing abnormalities. The main goal of this study is to evaluate the potential of infrared spectroscopy in tandem with Random Forest and XGBoost machine learning algorithms to diagnose the susceptibility of Klebsiella pneumoniae to antibiotics within approximately 20 minutes following the first culture. In this study, 1190 Klebsiella pneumoniae isolates were obtained from different patients with urinary tract infections. The isolates were measured by the infrared spectrometer, and the spectra were analyzed by machine learning algorithms Random Forest and XGBoost to determine their susceptibility regarding nine specific antibiotics. Our results confirm that it was possible to classify the isolates into sensitive and resistant to specific antibiotics with a success rate range of 80%-85% for the different tested antibiotics. These results prove the promising potential of infrared spectroscopy as a powerful diagnostic method for determining the Klebsiella pneumoniae susceptibility to antibiotics.Keywords: urinary tract infection (UTI), Klebsiella pneumoniae, bacterial susceptibility, infrared spectroscopy, machine learning
Procedia PDF Downloads 16810875 Improved Computational Efficiency of Machine Learning Algorithm Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK
Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick
Abstract:
The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning archetypal that could forecast COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organisation (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data is split into 8:2 ratio for training and testing purposes to forecast future new COVID cases. Support Vector Machines (SVM), Random Forests, and linear regression algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID cases is evaluated. Random Forest outperformed the other two Machine Learning algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n=30. The mean square error obtained for Random Forest is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis Random Forest algorithm can perform more effectively and efficiently in predicting the new COVID cases, which could help the health sector to take relevant control measures for the spread of the virus.Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest
Procedia PDF Downloads 12110874 Machine Learning Prediction of Compressive Damage and Energy Absorption in Carbon Fiber-Reinforced Polymer Tubular Structures
Authors: Milad Abbasi
Abstract:
Carbon fiber-reinforced polymer (CFRP) composite structures are increasingly being utilized in the automotive industry due to their lightweight and specific energy absorption capabilities. Although it is impossible to predict composite mechanical properties directly using theoretical methods, various research has been conducted so far in the literature for accurate simulation of CFRP structures' energy-absorbing behavior. In this research, axial compression experiments were carried out on hand lay-up unidirectional CFRP composite tubes. The fabrication method allowed the authors to extract the material properties of the CFRPs using ASTM D3039, D3410, and D3518 standards. A neural network machine learning algorithm was then utilized to build a robust prediction model to forecast the axial compressive properties of CFRP tubes while reducing high-cost experimental efforts. The predicted results have been compared with the experimental outcomes in terms of load-carrying capacity and energy absorption capability. The results showed high accuracy and precision in the prediction of the energy-absorption capacity of the CFRP tubes. This research also demonstrates the effectiveness and challenges of machine learning techniques in the robust simulation of composites' energy-absorption behavior. Interestingly, the proposed method considerably condensed numerical and experimental efforts in the simulation and calibration of CFRP composite tubes subjected to compressive loading.Keywords: CFRP composite tubes, energy absorption, crushing behavior, machine learning, neural network
Procedia PDF Downloads 15310873 Comparing the Detection of Autism Spectrum Disorder within Males and Females Using Machine Learning Techniques
Authors: Joseph Wolff, Jeffrey Eilbott
Abstract:
Autism Spectrum Disorders (ASD) are a spectrum of social disorders characterized by deficits in social communication, verbal ability, and interaction that can vary in severity. In recent years, researchers have used magnetic resonance imaging (MRI) to help detect how neural patterns in individuals with ASD differ from those of neurotypical (NT) controls for classification purposes. This study analyzed the classification of ASD within males and females using functional MRI data. Functional connectivity (FC) correlations among brain regions were used as feature inputs for machine learning algorithms. Analysis was performed on 558 cases from the Autism Brain Imaging Data Exchange (ABIDE) I dataset. When trained specifically on females, the algorithm underperformed in classifying the ASD subset of our testing population. Although the subject size was relatively smaller in the female group, the manual matching of both male and female training groups helps explain the algorithm’s bias, indicating the altered sex abnormalities in functional brain networks compared to typically developing peers. These results highlight the importance of taking sex into account when considering how generalizations of findings on males with ASD apply to females.Keywords: autism spectrum disorder, machine learning, neuroimaging, sex differences
Procedia PDF Downloads 20910872 Methods for Enhancing Ensemble Learning or Improving Classifiers of This Technique in the Analysis and Classification of Brain Signals
Authors: Seyed Mehdi Ghezi, Hesam Hasanpoor
Abstract:
This scientific article explores enhancement methods for ensemble learning with the aim of improving the performance of classifiers in the analysis and classification of brain signals. The research approach in this field consists of two main parts, each with its own strengths and weaknesses. The choice of approach depends on the specific research question and available resources. By combining these approaches and leveraging their respective strengths, researchers can enhance the accuracy and reliability of classification results, consequently advancing our understanding of the brain and its functions. The first approach focuses on utilizing machine learning methods to identify the best features among the vast array of features present in brain signals. The selection of features varies depending on the research objective, and different techniques have been employed for this purpose. For instance, the genetic algorithm has been used in some studies to identify the best features, while optimization methods have been utilized in others to identify the most influential features. Additionally, machine learning techniques have been applied to determine the influential electrodes in classification. Ensemble learning plays a crucial role in identifying the best features that contribute to learning, thereby improving the overall results. The second approach concentrates on designing and implementing methods for selecting the best classifier or utilizing meta-classifiers to enhance the final results in ensemble learning. In a different section of the research, a single classifier is used instead of multiple classifiers, employing different sets of features to improve the results. The article provides an in-depth examination of each technique, highlighting their advantages and limitations. By integrating these techniques, researchers can enhance the performance of classifiers in the analysis and classification of brain signals. This advancement in ensemble learning methodologies contributes to a better understanding of the brain and its functions, ultimately leading to improved accuracy and reliability in brain signal analysis and classification.Keywords: ensemble learning, brain signals, classification, feature selection, machine learning, genetic algorithm, optimization methods, influential features, influential electrodes, meta-classifiers
Procedia PDF Downloads 7510871 Predicting Costs in Construction Projects with Machine Learning: A Detailed Study Based on Activity-Level Data
Authors: Soheila Sadeghi
Abstract:
Construction projects are complex and often subject to significant cost overruns due to the multifaceted nature of the activities involved. Accurate cost estimation is crucial for effective budget planning and resource allocation. Traditional methods for predicting overruns often rely on expert judgment or analysis of historical data, which can be time-consuming, subjective, and may fail to consider important factors. However, with the increasing availability of data from construction projects, machine learning techniques can be leveraged to improve the accuracy of overrun predictions. This study applied machine learning algorithms to enhance the prediction of cost overruns in a case study of a construction project. The methodology involved the development and evaluation of two machine learning models: Random Forest and Neural Networks. Random Forest can handle high-dimensional data, capture complex relationships, and provide feature importance estimates. Neural Networks, particularly Deep Neural Networks (DNNs), are capable of automatically learning and modeling complex, non-linear relationships between input features and the target variable. These models can adapt to new data, reduce human bias, and uncover hidden patterns in the dataset. The findings of this study demonstrate that both Random Forest and Neural Networks can significantly improve the accuracy of cost overrun predictions compared to traditional methods. The Random Forest model also identified key cost drivers and risk factors, such as changes in the scope of work and delays in material delivery, which can inform better project risk management. However, the study acknowledges several limitations. First, the findings are based on a single construction project, which may limit the generalizability of the results to other projects or contexts. Second, the dataset, although comprehensive, may not capture all relevant factors influencing cost overruns, such as external economic conditions or political factors. Third, the study focuses primarily on cost overruns, while schedule overruns are not explicitly addressed. Future research should explore the application of machine learning techniques to a broader range of projects, incorporate additional data sources, and investigate the prediction of both cost and schedule overruns simultaneously.Keywords: cost prediction, machine learning, project management, random forest, neural networks
Procedia PDF Downloads 5410870 Hull Detection from Handwritten Digit Image
Authors: Sriraman Kothuri, Komal Teja Mattupalli
Abstract:
In this paper we proposed a novel algorithm for recognizing hulls in a hand written digits. This is an extension to the work on “Digit Recognition Using Freeman Chain code”. In order to find out the hulls in a user given digit it is necessary to follow three steps. Those are pre-processing, Boundary Extraction and at last apply the Hull Detection system in a way to attain the better results. The detection of Hull Regions is mainly intended to increase the machine learning capability in detection of characters or digits. This can also extend this in order to get the hull regions and their intensities in Black Holes in Space Exploration.Keywords: chain code, machine learning, hull regions, hull recognition system, SASK algorithm
Procedia PDF Downloads 40010869 A Machine Learning Approach for Efficient Resource Management in Construction Projects
Authors: Soheila Sadeghi
Abstract:
Construction projects are complex and often subject to significant cost overruns due to the multifaceted nature of the activities involved. Accurate cost estimation is crucial for effective budget planning and resource allocation. Traditional methods for predicting overruns often rely on expert judgment or analysis of historical data, which can be time-consuming, subjective, and may fail to consider important factors. However, with the increasing availability of data from construction projects, machine learning techniques can be leveraged to improve the accuracy of overrun predictions. This study applied machine learning algorithms to enhance the prediction of cost overruns in a case study of a construction project. The methodology involved the development and evaluation of two machine learning models: Random Forest and Neural Networks. Random Forest can handle high-dimensional data, capture complex relationships, and provide feature importance estimates. Neural Networks, particularly Deep Neural Networks (DNNs), are capable of automatically learning and modeling complex, non-linear relationships between input features and the target variable. These models can adapt to new data, reduce human bias, and uncover hidden patterns in the dataset. The findings of this study demonstrate that both Random Forest and Neural Networks can significantly improve the accuracy of cost overrun predictions compared to traditional methods. The Random Forest model also identified key cost drivers and risk factors, such as changes in the scope of work and delays in material delivery, which can inform better project risk management. However, the study acknowledges several limitations. First, the findings are based on a single construction project, which may limit the generalizability of the results to other projects or contexts. Second, the dataset, although comprehensive, may not capture all relevant factors influencing cost overruns, such as external economic conditions or political factors. Third, the study focuses primarily on cost overruns, while schedule overruns are not explicitly addressed. Future research should explore the application of machine learning techniques to a broader range of projects, incorporate additional data sources, and investigate the prediction of both cost and schedule overruns simultaneously.Keywords: resource allocation, machine learning, optimization, data-driven decision-making, project management
Procedia PDF Downloads 3910868 Current Methods for Drug Property Prediction in the Real World
Authors: Jacob Green, Cecilia Cabrera, Maximilian Jakobs, Andrea Dimitracopoulos, Mark van der Wilk, Ryan Greenhalgh
Abstract:
Predicting drug properties is key in drug discovery to enable de-risking of assets before expensive clinical trials and to find highly active compounds faster. Interest from the machine learning community has led to the release of a variety of benchmark datasets and proposed methods. However, it remains unclear for practitioners which method or approach is most suitable, as different papers benchmark on different datasets and methods, leading to varying conclusions that are not easily compared. Our large-scale empirical study links together numerous earlier works on different datasets and methods, thus offering a comprehensive overview of the existing property classes, datasets, and their interactions with different methods. We emphasise the importance of uncertainty quantification and the time and, therefore, cost of applying these methods in the drug development decision-making cycle. To the best of the author's knowledge, it has been observed that the optimal approach varies depending on the dataset and that engineered features with classical machine learning methods often outperform deep learning. Specifically, QSAR datasets are typically best analysed with classical methods such as Gaussian Processes, while ADMET datasets are sometimes better described by Trees or deep learning methods such as Graph Neural Networks or language models. Our work highlights that practitioners do not yet have a straightforward, black-box procedure to rely on and sets a precedent for creating practitioner-relevant benchmarks. Deep learning approaches must be proven on these benchmarks to become the practical method of choice in drug property prediction.Keywords: activity (QSAR), ADMET, classical methods, drug property prediction, empirical study, machine learning
Procedia PDF Downloads 81