Search results for: python machine learning libraries
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8369

Search results for: python machine learning libraries

8099 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower, and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software we used in our study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: preprocessing of the data used, feature detection, and classification. We tried to determine the success of our study with different accuracy metrics and the results. We presented it comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: decision tree, water quality, water pollution, machine learning

Procedia PDF Downloads 50
8098 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 98
8097 Enhancing Fault Detection in Rotating Machinery Using Wiener-CNN Method

Authors: Mohamad R. Moshtagh, Ahmad Bagheri

Abstract:

Accurate fault detection in rotating machinery is of utmost importance to ensure optimal performance and prevent costly downtime in industrial applications. This study presents a robust fault detection system based on vibration data collected from rotating gears under various operating conditions. The considered scenarios include: (1) both gears being healthy, (2) one healthy gear and one faulty gear, and (3) introducing an imbalanced condition to a healthy gear. Vibration data was acquired using a Hentek 1008 device and stored in a CSV file. Python code implemented in the Spider environment was used for data preprocessing and analysis. Winner features were extracted using the Wiener feature selection method. These features were then employed in multiple machine learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Random Forest, to evaluate their performance in detecting and classifying faults in both the training and validation datasets. The comparative analysis of the methods revealed the superior performance of the Wiener-CNN approach. The Wiener-CNN method achieved a remarkable accuracy of 100% for both the two-class (healthy gear and faulty gear) and three-class (healthy gear, faulty gear, and imbalanced) scenarios in the training and validation datasets. In contrast, the other methods exhibited varying levels of accuracy. The Wiener-MLP method attained 100% accuracy for the two-class training dataset and 100% for the validation dataset. For the three-class scenario, the Wiener-MLP method demonstrated 100% accuracy in the training dataset and 95.3% accuracy in the validation dataset. The Wiener-KNN method yielded 96.3% accuracy for the two-class training dataset and 94.5% for the validation dataset. In the three-class scenario, it achieved 85.3% accuracy in the training dataset and 77.2% in the validation dataset. The Wiener-Random Forest method achieved 100% accuracy for the two-class training dataset and 85% for the validation dataset, while in the three-class training dataset, it attained 100% accuracy and 90.8% accuracy for the validation dataset. The exceptional accuracy demonstrated by the Wiener-CNN method underscores its effectiveness in accurately identifying and classifying fault conditions in rotating machinery. The proposed fault detection system utilizes vibration data analysis and advanced machine learning techniques to improve operational reliability and productivity. By adopting the Wiener-CNN method, industrial systems can benefit from enhanced fault detection capabilities, facilitating proactive maintenance and reducing equipment downtime.

Keywords: fault detection, gearbox, machine learning, wiener method

Procedia PDF Downloads 39
8096 Accelerating Quantum Chemistry Calculations: Machine Learning for Efficient Evaluation of Electron-Repulsion Integrals

Authors: Nishant Rodrigues, Nicole Spanedda, Chilukuri K. Mohan, Arindam Chakraborty

Abstract:

A crucial objective in quantum chemistry is the computation of the energy levels of chemical systems. This task requires electron-repulsion integrals as inputs, and the steep computational cost of evaluating these integrals poses a major numerical challenge in efficient implementation of quantum chemical software. This work presents a moment-based machine-learning approach for the efficient evaluation of electron-repulsion integrals. These integrals were approximated using linear combinations of a small number of moments. Machine learning algorithms were applied to estimate the coefficients in the linear combination. A random forest approach was used to identify promising features using a recursive feature elimination approach, which performed best for learning the sign of each coefficient but not the magnitude. A neural network with two hidden layers were then used to learn the coefficient magnitudes along with an iterative feature masking approach to perform input vector compression, identifying a small subset of orbitals whose coefficients are sufficient for the quantum state energy computation. Finally, a small ensemble of neural networks (with a median rule for decision fusion) was shown to improve results when compared to a single network.

Keywords: quantum energy calculations, atomic orbitals, electron-repulsion integrals, ensemble machine learning, random forests, neural networks, feature extraction

Procedia PDF Downloads 58
8095 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects

Authors: Victor Radich, Tania Basso, Regina Moraes

Abstract:

Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.

Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring

Procedia PDF Downloads 19
8094 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia

Authors: The Danh Phan

Abstract:

House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.

Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise

Procedia PDF Downloads 172
8093 Digital Preservation: A Need of Tomorrow

Authors: Gaurav Kumar

Abstract:

Digital libraries have been established all over the world to create, maintain and to preserve the digital materials. This paper exhibits the importance and objectives of digital preservation. The necessities of preservation are hardware and software technology to interpret the digital documents and discuss various aspects of digital preservation.

Keywords: preservation, digital preservation, conservation, archive, repository, document, information technology, hardware, software, organization, machine readable format

Procedia PDF Downloads 542
8092 Copyright Clearance for Artificial Intelligence Training Data: Challenges and Solutions

Authors: Erva Akin

Abstract:

– The use of copyrighted material for machine learning purposes is a challenging issue in the field of artificial intelligence (AI). While machine learning algorithms require large amounts of data to train and improve their accuracy and creativity, the use of copyrighted material without permission from the authors may infringe on their intellectual property rights. In order to overcome copyright legal hurdle against the data sharing, access and re-use of data, the use of copyrighted material for machine learning purposes may be considered permissible under certain circumstances. For example, if the copyright holder has given permission to use the data through a licensing agreement, then the use for machine learning purposes may be lawful. It is also argued that copying for non-expressive purposes that do not involve conveying expressive elements to the public, such as automated data extraction, should not be seen as infringing. The focus of such ‘copy-reliant technologies’ is on understanding language rules, styles, and syntax and no creative ideas are being used. However, the non-expressive use defense is within the framework of the fair use doctrine, which allows the use of copyrighted material for research or educational purposes. The questions arise because the fair use doctrine is not available in EU law, instead, the InfoSoc Directive provides for a rigid system of exclusive rights with a list of exceptions and limitations. One could only argue that non-expressive uses of copyrighted material for machine learning purposes do not constitute a ‘reproduction’ in the first place. Nevertheless, the use of machine learning with copyrighted material is difficult because EU copyright law applies to the mere use of the works. Two solutions can be proposed to address the problem of copyright clearance for AI training data. The first is to introduce a broad exception for text and data mining, either mandatorily or for commercial and scientific purposes, or to permit the reproduction of works for non-expressive purposes. The second is that copyright laws should permit the reproduction of works for non-expressive purposes, which opens the door to discussions regarding the transposition of the fair use principle from the US into EU law. Both solutions aim to provide more space for AI developers to operate and encourage greater freedom, which could lead to more rapid innovation in the field. The Data Governance Act presents a significant opportunity to advance these debates. Finally, issues concerning the balance of general public interests and legitimate private interests in machine learning training data must be addressed. In my opinion, it is crucial that robot-creation output should fall into the public domain. Machines depend on human creativity, innovation, and expression. To encourage technological advancement and innovation, freedom of expression and business operation must be prioritised.

Keywords: artificial intelligence, copyright, data governance, machine learning

Procedia PDF Downloads 44
8091 New Machine Learning Optimization Approach Based on Input Variables Disposition Applied for Time Series Prediction

Authors: Hervice Roméo Fogno Fotsoa, Germaine Djuidje Kenmoe, Claude Vidal Aloyem Kazé

Abstract:

One of the main applications of machine learning is the prediction of time series. But a more accurate prediction requires a more optimal model of machine learning. Several optimization techniques have been developed, but without considering the input variables disposition of the system. Thus, this work aims to present a new machine learning architecture optimization technique based on their optimal input variables disposition. The validations are done on the prediction of wind time series, using data collected in Cameroon. The number of possible dispositions with four input variables is determined, i.e., twenty-four. Each of the dispositions is used to perform the prediction, with the main criteria being the training and prediction performances. The results obtained from a static architecture and a dynamic architecture of neural networks have shown that these performances are a function of the input variable's disposition, and this is in a different way from the architectures. This analysis revealed that it is necessary to take into account the input variable's disposition for the development of a more optimal neural network model. Thus, a new neural network training algorithm is proposed by introducing the search for the optimal input variables disposition in the traditional back-propagation algorithm. The results of the application of this new optimization approach on the two single neural network architectures are compared with the previously obtained results step by step. Moreover, this proposed approach is validated in a collaborative optimization method with a single objective optimization technique, i.e., genetic algorithm back-propagation neural networks. From these comparisons, it is concluded that each proposed model outperforms its traditional model in terms of training and prediction performance of time series. Thus the proposed optimization approach can be useful in improving the accuracy of time series forecasts. This proves that the proposed optimization approach can be useful in improving the accuracy of time series prediction based on machine learning.

Keywords: input variable disposition, machine learning, optimization, performance, time series prediction

Procedia PDF Downloads 54
8090 Image Segmentation with Deep Learning of Prostate Cancer Bone Metastases on Computed Tomography

Authors: Joseph M. Rich, Vinay A. Duddalwar, Assad A. Oberai

Abstract:

Prostate adenocarcinoma is the most common cancer in males, with osseous metastases as the commonest site of metastatic prostate carcinoma (mPC). Treatment monitoring is based on the evaluation and characterization of lesions on multiple imaging studies, including Computed Tomography (CT). Monitoring of the osseous disease burden, including follow-up of lesions and identification and characterization of new lesions, is a laborious task for radiologists. Deep learning algorithms are increasingly used to perform tasks such as identification and segmentation for osseous metastatic disease and provide accurate information regarding metastatic burden. Here, nnUNet was used to produce a model which can segment CT scan images of prostate adenocarcinoma vertebral bone metastatic lesions. nnUNet is an open-source Python package that adds optimizations to deep learning-based UNet architecture but has not been extensively combined with transfer learning techniques due to the absence of a readily available functionality of this method. The IRB-approved study data set includes imaging studies from patients with mPC who were enrolled in clinical trials at the University of Southern California (USC) Health Science Campus and Los Angeles County (LAC)/USC medical center. Manual segmentation of metastatic lesions was completed by an expert radiologist Dr. Vinay Duddalwar (20+ years in radiology and oncologic imaging), to serve as ground truths for the automated segmentation. Despite nnUNet’s success on some medical segmentation tasks, it only produced an average Dice Similarity Coefficient (DSC) of 0.31 on the USC dataset. DSC results fell in a bimodal distribution, with most scores falling either over 0.66 (reasonably accurate) or at 0 (no lesion detected). Applying more aggressive data augmentation techniques dropped the DSC to 0.15, and reducing the number of epochs reduced the DSC to below 0.1. Datasets have been identified for transfer learning, which involve balancing between size and similarity of the dataset. Identified datasets include the Pancreas data from the Medical Segmentation Decathlon, Pelvic Reference Data, and CT volumes with multiple organ segmentations (CT-ORG). Some of the challenges of producing an accurate model from the USC dataset include small dataset size (115 images), 2D data (as nnUNet generally performs better on 3D data), and the limited amount of public data capturing annotated CT images of bone lesions. Optimizations and improvements will be made by applying transfer learning and generative methods, including incorporating generative adversarial networks and diffusion models in order to augment the dataset. Performance with different libraries, including MONAI and custom architectures with Pytorch, will be compared. In the future, molecular correlations will be tracked with radiologic features for the purpose of multimodal composite biomarker identification. Once validated, these models will be incorporated into evaluation workflows to optimize radiologist evaluation. Our work demonstrates the challenges of applying automated image segmentation to small medical datasets and lays a foundation for techniques to improve performance. As machine learning models become increasingly incorporated into the workflow of radiologists, these findings will help improve the speed and accuracy of vertebral metastatic lesions detection.

Keywords: deep learning, image segmentation, medicine, nnUNet, prostate carcinoma, radiomics

Procedia PDF Downloads 53
8089 Bridging the Gaping Levels of Information Entree for Visually Impaired Students in the Sri Lankan University Libraries

Authors: Wilfred Jeyatheese Jeyaraj

Abstract:

Education is a key determinant of future success, and every person deserves non-discriminant access to information for educational inevitabilities in any case. Analysing and understanding complex information is a crucial learning tool, especially for students. In order to compete equally with sighted students, visually impaired students require the unhinged access to access to all the available information resources. When the education of visually impaired students comes to a focal point, it can be stated that visually impaired students encounter several obstacles and barriers before they enter the university and during their time there as students. These obstacles and barriers are spread across technical, organizational and social arenas. This study reveals the possible approaches to absorb and benefit from the information provided by the Sri Lankan University Libraries for visually impaired students. Purposive sampling technique was used to select sample visually impaired students attached to the Sri Lankan National universities. There are 07 National universities which accommodate the visually impaired students and with the identified data, they were selected for this study and 80 visually impaired students were selected as the sample group. Descriptive type survey method was used to collect data. Structured questionnaires, interviews and direct observation were used as research instruments. As far as the Sri Lankan context spread is concerned, visually impaired students are able to finish their courses through their own determination to overcome the barriers they encounter on their way to graduation, through moral and practical support from their own friends and very often through a high level of creativity. According to the findings there are no specially trained university librarians to serve visually impaired users and less number of assistive technology equipment are available at present. This paper enables all university libraries in Sri Lanka to be informed about the social isolation of visually compromised students at the Sri Lankan universities and focuses on the rectification issues by considering their distinct case for interaction.

Keywords: information access, Sri Lanka, university libraries, visual impairment

Procedia PDF Downloads 199
8088 Comparison of Different Machine Learning Models for Time-Series Based Load Forecasting of Electric Vehicle Charging Stations

Authors: H. J. Joshi, Satyajeet Patil, Parth Dandavate, Mihir Kulkarni, Harshita Agrawal

Abstract:

As the world looks towards a sustainable future, electric vehicles have become increasingly popular. Millions worldwide are looking to switch to Electric cars over the previously favored combustion engine-powered cars. This demand has seen an increase in Electric Vehicle Charging Stations. The big challenge is that the randomness of electrical energy makes it tough for these charging stations to provide an adequate amount of energy over a specific amount of time. Thus, it has become increasingly crucial to model these patterns and forecast the energy needs of power stations. This paper aims to analyze how different machine learning models perform on Electric Vehicle charging time-series data. The data set consists of authentic Electric Vehicle Data from the Netherlands. It has an overview of ten thousand transactions from public stations operated by EVnetNL.

Keywords: forecasting, smart grid, electric vehicle load forecasting, machine learning, time series forecasting

Procedia PDF Downloads 68
8087 ANOVA-Based Feature Selection and Machine Learning System for IoT Anomaly Detection

Authors: Muhammad Ali

Abstract:

Cyber-attacks and anomaly detection on the Internet of Things (IoT) infrastructure is emerging concern in the domain of data-driven intrusion. Rapidly increasing IoT risk is now making headlines around the world. denial of service, malicious control, data type probing, malicious operation, DDos, scan, spying, and wrong setup are attacks and anomalies that can affect an IoT system failure. Everyone talks about cyber security, connectivity, smart devices, and real-time data extraction. IoT devices expose a wide variety of new cyber security attack vectors in network traffic. For further than IoT development, and mainly for smart and IoT applications, there is a necessity for intelligent processing and analysis of data. So, our approach is too secure. We train several machine learning models that have been compared to accurately predicting attacks and anomalies on IoT systems, considering IoT applications, with ANOVA-based feature selection with fewer prediction models to evaluate network traffic to help prevent IoT devices. The machine learning (ML) algorithms that have been used here are KNN, SVM, NB, D.T., and R.F., with the most satisfactory test accuracy with fast detection. The evaluation of ML metrics includes precision, recall, F1 score, FPR, NPV, G.M., MCC, and AUC & ROC. The Random Forest algorithm achieved the best results with less prediction time, with an accuracy of 99.98%.

Keywords: machine learning, analysis of variance, Internet of Thing, network security, intrusion detection

Procedia PDF Downloads 81
8086 An Overview of Domain Models of Urban Quantitative Analysis

Authors: Mohan Li

Abstract:

Nowadays, intelligent research technology is more and more important than traditional research methods in urban research work, and this proportion will greatly increase in the next few decades. Frequently such analyzing work cannot be carried without some software engineering knowledge. And here, domain models of urban research will be necessary when applying software engineering knowledge to urban work. In many urban plan practice projects, making rational models, feeding reliable data, and providing enough computation all make indispensable assistance in producing good urban planning. During the whole work process, domain models can optimize workflow design. At present, human beings have entered the era of big data. The amount of digital data generated by cities every day will increase at an exponential rate, and new data forms are constantly emerging. How to select a suitable data set from the massive amount of data, manage and process it has become an ability that more and more planners and urban researchers need to possess. This paper summarizes and makes predictions of the emergence of technologies and technological iterations that may affect urban research in the future, discover urban problems, and implement targeted sustainable urban strategies. They are summarized into seven major domain models. They are urban and rural regional domain model, urban ecological domain model, urban industry domain model, development dynamic domain model, urban social and cultural domain model, urban traffic domain model, and urban space domain model. These seven domain models can be used to guide the construction of systematic urban research topics and help researchers organize a series of intelligent analytical tools, such as Python, R, GIS, etc. These seven models make full use of quantitative spatial analysis, machine learning, and other technologies to achieve higher efficiency and accuracy in urban research, assisting people in making reasonable decisions.

Keywords: big data, domain model, urban planning, urban quantitative analysis, machine learning, workflow design

Procedia PDF Downloads 137
8085 FlexPoints: Efficient Algorithm for Detection of Electrocardiogram Characteristic Points

Authors: Daniel Bulanda, Janusz A. Starzyk, Adrian Horzyk

Abstract:

The electrocardiogram (ECG) is one of the most commonly used medical tests, essential for correct diagnosis and treatment of the patient. While ECG devices generate a huge amount of data, only a small part of them carries valuable medical information. To deal with this problem, many compression algorithms and filters have been developed over the past years. However, the rapid development of new machine learning techniques poses new challenges. To address this class of problems, we created the FlexPoints algorithm that searches for characteristic points on the ECG signal and ignores all other points that do not carry relevant medical information. The conducted experiments proved that the presented algorithm can significantly reduce the number of data points which represents ECG signal without losing valuable medical information. These sparse but essential characteristic points (flex points) can be a perfect input for some modern machine learning models, which works much better using flex points as an input instead of raw data or data compressed by many popular algorithms.

Keywords: characteristic points, electrocardiogram, ECG, machine learning, signal compression

Procedia PDF Downloads 124
8084 WebAppShield: An Approach Exploiting Machine Learning to Detect SQLi Attacks in an Application Layer in Run-time

Authors: Ahmed Abdulla Ashlam, Atta Badii, Frederic Stahl

Abstract:

In recent years, SQL injection attacks have been identified as being prevalent against web applications. They affect network security and user data, which leads to a considerable loss of money and data every year. This paper presents the use of classification algorithms in machine learning using a method to classify the login data filtering inputs into "SQLi" or "Non-SQLi,” thus increasing the reliability and accuracy of results in terms of deciding whether an operation is an attack or a valid operation. A method Web-App auto-generated twin data structure replication. Shielding against SQLi attacks (WebAppShield) that verifies all users and prevents attackers (SQLi attacks) from entering and or accessing the database, which the machine learning module predicts as "Non-SQLi" has been developed. A special login form has been developed with a special instance of data validation; this verification process secures the web application from its early stages. The system has been tested and validated, up to 99% of SQLi attacks have been prevented.

Keywords: SQL injection, attacks, web application, accuracy, database

Procedia PDF Downloads 108
8083 Employing QR Code as an Effective Educational Tool for Quick Access to Sources of Kindergarten Concepts

Authors: Ahmed Amin Mousa, M. Abd El-Salam

Abstract:

This study discusses a simple solution for the problem of shortage in learning resources for kindergarten teachers. Occasionally, kindergarten teachers cannot access proper resources by usual search methods as libraries or search engines. Furthermore, these methods require a long time and efforts for preparing. The study is expected to facilitate accessing learning resources. Moreover, it suggests a potential direction for using QR code inside the classroom. The present work proposes that QR code can be used for digitizing kindergarten curriculums and accessing various learning resources. It investigates using QR code for saving information related to the concepts which kindergarten teachers use in the current educational situation. The researchers have established a guide for kindergarten teachers based on the Egyptian official curriculum. The guide provides different learning resources for each scientific and mathematical concept in the curriculum, and each learning resource is represented as a QR code image that contains its URL. Therefore, kindergarten teachers can use smartphone applications for reading QR codes and displaying the related learning resources for students immediately. The guide has been provided to a group of 108 teachers for using inside their classrooms. The results showed that the teachers approved the guide, and gave a good response.

Keywords: kindergarten, child, learning resources, QR code, smart phone, mobile

Procedia PDF Downloads 252
8082 Use Cloud-Based Watson Deep Learning Platform to Train Models Faster and More Accurate

Authors: Susan Diamond

Abstract:

Machine Learning workloads have traditionally been run in high-performance computing (HPC) environments, where users log in to dedicated machines and utilize the attached GPUs to run training jobs on huge datasets. Training of large neural network models is very resource intensive, and even after exploiting parallelism and accelerators such as GPUs, a single training job can still take days. Consequently, the cost of hardware is a barrier to entry. Even when upfront cost is not a concern, the lead time to set up such an HPC environment takes months from acquiring hardware to set up the hardware with the right set of firmware, software installed and configured. Furthermore, scalability is hard to achieve in a rigid traditional lab environment. Therefore, it is slow to react to the dynamic change in the artificial intelligent industry. Watson Deep Learning as a service, a cloud-based deep learning platform that mitigates the long lead time and high upfront investment in hardware. It enables robust and scalable sharing of resources among the teams in an organization. It is designed for on-demand cloud environments. Providing a similar user experience in a multi-tenant cloud environment comes with its own unique challenges regarding fault tolerance, performance, and security. Watson Deep Learning as a service tackles these challenges and present a deep learning stack for the cloud environments in a secure, scalable and fault-tolerant manner. It supports a wide range of deep-learning frameworks such as Tensorflow, PyTorch, Caffe, Torch, Theano, and MXNet etc. These frameworks reduce the effort and skillset required to design, train, and use deep learning models. Deep Learning as a service is used at IBM by AI researchers in areas including machine translation, computer vision, and healthcare. 

Keywords: deep learning, machine learning, cognitive computing, model training

Procedia PDF Downloads 171
8081 A Machine Learning-Based Approach to Capture Extreme Rainfall Events

Authors: Willy Mbenza, Sho Kenjiro

Abstract:

Increasing efforts are directed towards a better understanding and foreknowledge of extreme precipitation likelihood, given the adverse effects associated with their occurrence. This knowledge plays a crucial role in long-term planning and the formulation of effective emergency response. However, predicting extreme events reliably presents a challenge to conventional empirical/statistics due to the involvement of numerous variables spanning different time and space scales. In the recent time, Machine Learning has emerged as a promising tool for predicting the dynamics of extreme precipitation. ML techniques enables the consideration of both local and regional physical variables that have a strong influence on the likelihood of extreme precipitation. These variables encompasses factors such as air temperature, soil moisture, specific humidity, aerosol concentration, among others. In this study, we develop an ML model that incorporates both local and regional variables while establishing a robust relationship between physical variables and precipitation during the downscaling process. Furthermore, the model provides valuable information on the frequency and duration of a given intensity of precipitation.

Keywords: machine learning (ML), predictions, rainfall events, regional variables

Procedia PDF Downloads 43
8080 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari

Abstract:

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Keywords: Arabic language acquisition and learning, natural language processing, morphological analyzer, part-of-speech

Procedia PDF Downloads 111
8079 A Convolutional Neural Network Based Vehicle Theft Detection, Location, and Reporting System

Authors: Michael Moeti, Khuliso Sigama, Thapelo Samuel Matlala

Abstract:

One of the principal challenges that the world is confronted with is insecurity. The crime rate is increasing exponentially, and protecting our physical assets especially in the motorist industry, is becoming impossible when applying our own strength. The need to develop technological solutions that detect and report theft without any human interference is inevitable. This is critical, especially for vehicle owners, to ensure theft detection and speedy identification towards recovery efforts in cases where a vehicle is missing or attempted theft is taking place. The vehicle theft detection system uses Convolutional Neural Network (CNN) to recognize the driver's face captured using an installed mobile phone device. The location identification function uses a Global Positioning System (GPS) to determine the real-time location of the vehicle. Upon identification of the location, Global System for Mobile Communications (GSM) technology is used to report or notify the vehicle owner about the whereabouts of the vehicle. The installed mobile app was implemented by making use of python as it is undoubtedly the best choice in machine learning. It allows easy access to machine learning algorithms through its widely developed library ecosystem. The graphical user interface was developed by making use of JAVA as it is better suited for mobile development. Google's online database (Firebase) was used as a means of storage for the application. The system integration test was performed using a simple percentage analysis. Sixty (60) vehicle owners participated in this study as a sample, and questionnaires were used in order to establish the acceptability of the system developed. The result indicates the efficiency of the proposed system, and consequently, the paper proposes the use of the system can effectively monitor the vehicle at any given place, even if it is driven outside its normal jurisdiction. More so, the system can be used as a database to detect, locate and report missing vehicles to different security agencies.

Keywords: CNN, location identification, tracking, GPS, GSM

Procedia PDF Downloads 116
8078 Machine Learning Driven Analysis of Kepler Objects of Interest to Identify Exoplanets

Authors: Akshat Kumar, Vidushi

Abstract:

This paper identifies 27 KOIs, 26 of which are currently classified as candidates and one as false positives that have a high probability of being confirmed. For this purpose, 11 machine learning algorithms were implemented on the cumulative kepler dataset sourced from the NASA exoplanet archive; it was observed that the best-performing model was HistGradientBoosting and XGBoost with a test accuracy of 93.5%, and the lowest-performing model was Gaussian NB with a test accuracy of 54%, to test model performance F1, cross-validation score and RUC curve was calculated. Based on the learned models, the significant characteristics for confirm exoplanets were identified, putting emphasis on the object’s transit and stellar properties; these characteristics were namely koi_count, koi_prad, koi_period, koi_dor, koi_ror, and koi_smass, which were later considered to filter out the potential KOIs. The paper also calculates the Earth similarity index based on the planetary radius and equilibrium temperature for each KOI identified to aid in their classification.

Keywords: Kepler objects of interest, exoplanets, space exploration, machine learning, earth similarity index, transit photometry

Procedia PDF Downloads 3
8077 Risk Factors of Becoming NEET Youth in Iran: A Machine Learning Approach

Authors: Hamed Rahmani, Wim Groot

Abstract:

The term "youth not in employment, education or training (NEET)" refers to a combination of youth unemployment and school dropout. This study investigates the variables that increase the risk of becoming NEET in Iran. A selection bias-adjusted Probit model was employed using machine learning to identify these risk factors. We used cross-sectional data obtained from the Statistical Centre of Iran and the Ministry of Cooperatives Labour and Social Welfare that was taken from the labour force survey conducted in the spring of 2021. We look at years of education, work experience, housework, the number of children under the age of six in the home, family education, birthplace, and the amount of land owned by households. Results show that hours spent performing domestic chores enhance the likelihood of youth becoming NEET, and years of education and years of potential work experience decrease the chance of being NEET. The findings also show that female youth born in cities were less likely than those born in rural regions to become NEET.

Keywords: NEET youth, probit, CART, machine learning, unemployment

Procedia PDF Downloads 64
8076 Development of Computational Approach for Calculation of Hydrogen Solubility in Hydrocarbons for Treatment of Petroleum

Authors: Abdulrahman Sumayli, Saad M. AlShahrani

Abstract:

For the hydrogenation process, knowing the solubility of hydrogen (H2) in hydrocarbons is critical to improve the efficiency of the process. We investigated the H2 solubility computation in four heavy crude oil feedstocks using machine learning techniques. Temperature, pressure, and feedstock type were considered as the inputs to the models, while the hydrogen solubility was the sole response. Specifically, we employed three different models: Support Vector Regression (SVR), Gaussian process regression (GPR), and Bayesian ridge regression (BRR). To achieve the best performance, the hyper-parameters of these models are optimized using the whale optimization algorithm (WOA). We evaluated the models using a dataset of solubility measurements in various feedstocks, and we compared their performance based on several metrics. Our results show that the WOA-SVR model tuned with WOA achieves the best performance overall, with an RMSE of 1.38 × 10− 2 and an R-squared of 0.991. These findings suggest that machine learning techniques can provide accurate predictions of hydrogen solubility in different feedstocks, which could be useful in the development of hydrogen-related technologies. Besides, the solubility of hydrogen in the four heavy oil fractions is estimated in different ranges of temperatures and pressures of 150 ◦C–350 ◦C and 1.2 MPa–10.8 MPa, respectively

Keywords: temperature, pressure variations, machine learning, oil treatment

Procedia PDF Downloads 31
8075 Automatic Fluid-Structure Interaction Modeling and Analysis of Butterfly Valve Using Python Script

Authors: N. Guru Prasath, Sangjin Ma, Chang-Wan Kim

Abstract:

A butterfly valve is a quarter turn valve which is used to control the flow of a fluid through a section of pipe. Generally, butterfly valve is used in wide range of applications such as water distribution, sewage, oil and gas plants. In particular, butterfly valve with larger diameter finds its immense applications in hydro power plants to control the fluid flow. In-lieu with the constraints in cost and size to run laboratory setup, analysis of large diameter values will be mostly studied by computational method which is the best and inexpensive solution. For fluid and structural analysis, CFD and FEM software is used to perform large scale valve analyses, respectively. In order to perform above analysis in butterfly valve, the CAD model has to recreate and perform mesh in conventional software’s for various dimensions of valve. Therefore, its limitation is time consuming process. In-order to overcome that issue, python code was created to outcome complete pre-processing setup automatically in Salome software. Applying dimensions of the model clearly in the python code makes the running time comparatively lower and easier way to perform analysis of the valve. Hence, in this paper, an attempt was made to study the fluid-structure interaction (FSI) of butterfly valves by varying the valve angles and dimensions using python code in pre-processing software, and results are produced.

Keywords: butterfly valve, flow coefficient, automatic CFD analysis, FSI analysis

Procedia PDF Downloads 200
8074 Development of pm2.5 Forecasting System in Seoul, South Korea Using Chemical Transport Modeling and ConvLSTM-DNN

Authors: Ji-Seok Koo, Hee‑Yong Kwon, Hui-Young Yun, Kyung-Hui Wang, Youn-Seo Koo

Abstract:

This paper presents a forecasting system for PM2.5 levels in Seoul, South Korea, leveraging a combination of chemical transport modeling and ConvLSTM-DNN machine learning technology. Exposure to PM2.5 has known detrimental impacts on public health, making its prediction crucial for establishing preventive measures. Existing forecasting models, like the Community Multiscale Air Quality (CMAQ) and Weather Research and Forecasting (WRF), are hindered by their reliance on uncertain input data, such as anthropogenic emissions and meteorological patterns, as well as certain intrinsic model limitations. The system we've developed specifically addresses these issues by integrating machine learning and using carefully selected input features that account for local and distant sources of PM2.5. In South Korea, the PM2.5 concentration is greatly influenced by both local emissions and long-range transport from China, and our model effectively captures these spatial and temporal dynamics. Our PM2.5 prediction system combines the strengths of advanced hybrid machine learning algorithms, convLSTM and DNN, to improve upon the limitations of the traditional CMAQ model. Data used in the system include forecasted information from CMAQ and WRF models, along with actual PM2.5 concentration and weather variable data from monitoring stations in China and South Korea. The system was implemented specifically for Seoul's PM2.5 forecasting.

Keywords: PM2.5 forecast, machine learning, convLSTM, DNN

Procedia PDF Downloads 21
8073 Mental Health Diagnosis through Machine Learning Approaches

Authors: Md Rafiqul Islam, Ashir Ahmed, Anwaar Ulhaq, Abu Raihan M. Kamal, Yuan Miao, Hua Wang

Abstract:

Mental health of people is equally important as of their physical health. Mental health and well-being are influenced not only by individual attributes but also by the social circumstances in which people find themselves and the environment in which they live. Like physical health, there is a number of internal and external factors such as biological, social and occupational factors that could influence the mental health of people. People living in poverty, suffering from chronic health conditions, minority groups, and those who exposed to/or displaced by war or conflict are generally more likely to develop mental health conditions. However, to authors’ best knowledge, there is dearth of knowledge on the impact of workplace (especially the highly stressed IT/Tech workplace) on the mental health of its workers. This study attempts to examine the factors influencing the mental health of tech workers. A publicly available dataset containing more than 65,000 cells and 100 attributes is examined for this purpose. Number of machine learning techniques such as ‘Decision Tree’, ‘K nearest neighbor’ ‘Support Vector Machine’ and ‘Ensemble’, are then applied to the selected dataset to draw the findings. It is anticipated that the analysis reported in this study would contribute in presenting useful insights on the attributes contributing in the mental health of tech workers using relevant machine learning techniques.

Keywords: mental disorder, diagnosis, occupational stress, IT workplace

Procedia PDF Downloads 246
8072 Deep Learning to Enhance Mathematics Education for Secondary Students in Sri Lanka

Authors: Selvavinayagan Babiharan

Abstract:

This research aims to develop a deep learning platform to enhance mathematics education for secondary students in Sri Lanka. The platform will be designed to incorporate interactive and user-friendly features to engage students in active learning and promote their mathematical skills. The proposed platform will be developed using TensorFlow and Keras, two widely used deep learning frameworks. The system will be trained on a large dataset of math problems, which will be collected from Sri Lankan school curricula. The results of this research will contribute to the improvement of mathematics education in Sri Lanka and provide a valuable tool for teachers to enhance the learning experience of their students.

Keywords: information technology, education, machine learning, mathematics

Procedia PDF Downloads 44
8071 Machine Learning Based Gender Identification of Authors of Entry Programs

Authors: Go Woon Kwak, Siyoung Jun, Soyun Maeng, Haeyoung Lee

Abstract:

Entry is an education platform used in South Korea, created to help students learn to program, in which they can learn to code while playing. Using the online version of the entry, teachers can easily assign programming homework to the student and the students can make programs simply by linking programming blocks. However, the programs may be made by others, so that the authors of the programs should be identified. In this paper, as the first step toward author identification of entry programs, we present an artificial neural network based classification approach to identify genders of authors of a program written in an entry. A neural network has been trained from labeled training data that we have collected. Our result in progress, although preliminary, shows that the proposed approach could be feasible to be applied to the online version of entry for gender identification of authors. As future work, we will first use a machine learning technique for age identification of entry programs, which would be the second step toward the author identification.

Keywords: artificial intelligence, author identification, deep neural network, gender identification, machine learning

Procedia PDF Downloads 284
8070 Predicting Radioactive Waste Glass Viscosity, Density and Dissolution with Machine Learning

Authors: Joseph Lillington, Tom Gout, Mike Harrison, Ian Farnan

Abstract:

The vitrification of high-level nuclear waste within borosilicate glass and its incorporation within a multi-barrier repository deep underground is widely accepted as the preferred disposal method. However, for this to happen, any safety case will require validation that the initially localized radionuclides will not be considerably released into the near/far-field. Therefore, accurate mechanistic models are necessary to predict glass dissolution, and these should be robust to a variety of incorporated waste species and leaching test conditions, particularly given substantial variations across international waste-streams. Here, machine learning is used to predict glass material properties (viscosity, density) and glass leaching model parameters from large-scale industrial data. A variety of different machine learning algorithms have been compared to assess performance. Density was predicted solely from composition, whereas viscosity additionally considered temperature. To predict suitable glass leaching model parameters, a large simulated dataset was created by coupling MATLAB and the chemical reactive-transport code HYTEC, considering the state-of-the-art GRAAL model (glass reactivity in allowance of the alteration layer). The trained models were then subsequently applied to the large-scale industrial, experimental data to identify potentially appropriate model parameters. Results indicate that ensemble methods can accurately predict viscosity as a function of temperature and composition across all three industrial datasets. Glass density prediction shows reliable learning performance with predictions primarily being within the experimental uncertainty of the test data. Furthermore, machine learning can predict glass dissolution model parameters behavior, demonstrating potential value in GRAAL model development and in assessing suitable model parameters for large-scale industrial glass dissolution data.

Keywords: machine learning, predictive modelling, pattern recognition, radioactive waste glass

Procedia PDF Downloads 79