Search results for: python machine learning libraries
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8609

Search results for: python machine learning libraries

8339 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 117
8338 Machine Learning for Targeting of Conditional Cash Transfers: Improving the Effectiveness of Proxy Means Tests to Identify Future School Dropouts and the Poor

Authors: Cristian Crespo

Abstract:

Conditional cash transfers (CCTs) have been targeted towards the poor. Thus, their targeting assessments check whether these schemes have been allocated to low-income households or individuals. However, CCTs have more than one goal and target group. An additional goal of CCTs is to increase school enrolment. Hence, students at risk of dropping out of school also are a target group. This paper analyses whether one of the most common targeting mechanisms of CCTs, a proxy means test (PMT), is suitable to identify the poor and future school dropouts. The PMT is compared with alternative approaches that use the outputs of a predictive model of school dropout. This model was built using machine learning algorithms and rich administrative datasets from Chile. The paper shows that using machine learning outputs in conjunction with the PMT increases targeting effectiveness by identifying more students who are either poor or future dropouts. This joint targeting approach increases effectiveness in different scenarios except when the social valuation of the two target groups largely differs. In these cases, the most likely optimal approach is to solely adopt the targeting mechanism designed to find the highly valued group.

Keywords: conditional cash transfers, machine learning, poverty, proxy means tests, school dropout prediction, targeting

Procedia PDF Downloads 189
8337 Convolutional Neural Networks versus Radiomic Analysis for Classification of Breast Mammogram

Authors: Mehwish Asghar

Abstract:

Breast Cancer (BC) is a common type of cancer among women. Its screening is usually performed using different imaging modalities such as magnetic resonance imaging, mammogram, X-ray, CT, etc. Among these modalities’ mammogram is considered a powerful tool for diagnosis and screening of breast cancer. Sophisticated machine learning approaches have shown promising results in complementing human diagnosis. Generally, machine learning methods can be divided into two major classes: one is Radiomics analysis (RA), where image features are extracted manually; and the other one is the concept of convolutional neural networks (CNN), in which the computer learns to recognize image features on its own. This research aims to improve the incidence of early detection, thus reducing the mortality rate caused by breast cancer through the latest advancements in computer science, in general, and machine learning, in particular. It has also been aimed to ease the burden of doctors by improving and automating the process of breast cancer detection. This research is related to a relative analysis of different techniques for the implementation of different models for detecting and classifying breast cancer. The main goal of this research is to provide a detailed view of results and performances between different techniques. The purpose of this paper is to explore the potential of a convolutional neural network (CNN) w.r.t feature extractor and as a classifier. Also, in this research, it has been aimed to add the module of Radiomics for comparison of its results with deep learning techniques.

Keywords: breast cancer (BC), machine learning (ML), convolutional neural network (CNN), radionics, magnetic resonance imaging, artificial intelligence

Procedia PDF Downloads 208
8336 Computational Intelligence and Machine Learning for Urban Drainage Infrastructure Asset Management

Authors: Thewodros K. Geberemariam

Abstract:

The rapid physical expansion of urbanization coupled with aging infrastructure presents a unique decision and management challenges for many big city municipalities. Cities must therefore upgrade and maintain the existing aging urban drainage infrastructure systems to keep up with the demands. Given the overall contribution of assets to municipal revenue and the importance of infrastructure to the success of a livable city, many municipalities are currently looking for a robust and smart urban drainage infrastructure asset management solution that combines management, financial, engineering and technical practices. This robust decision-making shall rely on sound, complete, current and relevant data that enables asset valuation, impairment testing, lifecycle modeling, and forecasting across the multiple asset portfolios. On this paper, predictive computational intelligence (CI) and multi-class machine learning (ML) coupled with online, offline, and historical record data that are collected from an array of multi-parameter sensors are used for the extraction of different operational and non-conforming patterns hidden in structured and unstructured data to determine and produce actionable insight on the current and future states of the network. This paper aims to improve the strategic decision-making process by identifying all possible alternatives; evaluate the risk of each alternative, and choose the alternative most likely to attain the required goal in a cost-effective manner using historical and near real-time urban drainage infrastructure data for urban drainage infrastructures assets that have previously not benefited from computational intelligence and machine learning advancements.

Keywords: computational intelligence, machine learning, urban drainage infrastructure, machine learning, classification, prediction, asset management space

Procedia PDF Downloads 144
8335 Data Modeling and Calibration of In-Line Pultrusion and Laser Ablation Machine Processes

Authors: David F. Nettleton, Christian Wasiak, Jonas Dorissen, David Gillen, Alexandr Tretyak, Elodie Bugnicourt, Alejandro Rosales

Abstract:

In this work, preliminary results are given for the modeling and calibration of two inline processes, pultrusion, and laser ablation, using machine learning techniques. The end product of the processes is the core of a medical guidewire, manufactured to comply with a user specification of diameter and flexibility. An ensemble approach is followed which requires training several models. Two state of the art machine learning algorithms are benchmarked: Kernel Recursive Least Squares (KRLS) and Support Vector Regression (SVR). The final objective is to build a precise digital model of the pultrusion and laser ablation process in order to calibrate the resulting diameter and flexibility of a medical guidewire, which is the end product while taking into account the friction on the forming die. The result is an ensemble of models, whose output is within a strict required tolerance and which covers the required range of diameter and flexibility of the guidewire end product. The modeling and automatic calibration of complex in-line industrial processes is a key aspect of the Industry 4.0 movement for cyber-physical systems.

Keywords: calibration, data modeling, industrial processes, machine learning

Procedia PDF Downloads 279
8334 Digital Preservation: A Need of Tomorrow

Authors: Gaurav Kumar

Abstract:

Digital libraries have been established all over the world to create, maintain and to preserve the digital materials. This paper exhibits the importance and objectives of digital preservation. The necessities of preservation are hardware and software technology to interpret the digital documents and discuss various aspects of digital preservation.

Keywords: preservation, digital preservation, conservation, archive, repository, document, information technology, hardware, software, organization, machine readable format

Procedia PDF Downloads 574
8333 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 420
8332 Machine Learning in Patent Law: How Genetic Breeding Algorithms Challenge Modern Patent Law Regimes

Authors: Stefan Papastefanou

Abstract:

Artificial intelligence (AI) is an interdisciplinary field of computer science with the aim of creating intelligent machine behavior. Early approaches to AI have been configured to operate in very constrained environments where the behavior of the AI system was previously determined by formal rules. Knowledge was presented as a set of rules that allowed the AI system to determine the results for specific problems; as a structure of if-else rules that could be traversed to find a solution to a particular problem or question. However, such rule-based systems typically have not been able to generalize beyond the knowledge provided. All over the world and especially in IT-heavy industries such as the United States, the European Union, Singapore, and China, machine learning has developed to be an immense asset, and its applications are becoming more and more significant. It has to be examined how such products of machine learning models can and should be protected by IP law and for the purpose of this paper patent law specifically, since it is the IP law regime closest to technical inventions and computing methods in technical applications. Genetic breeding models are currently less popular than recursive neural network method and deep learning, but this approach can be more easily described by referring to the evolution of natural organisms, and with increasing computational power; the genetic breeding method as a subset of the evolutionary algorithms models is expected to be regaining popularity. The research method focuses on patentability (according to the world’s most significant patent law regimes such as China, Singapore, the European Union, and the United States) of AI inventions and machine learning. Questions of the technical nature of the problem to be solved, the inventive step as such, and the question of the state of the art and the associated obviousness of the solution arise in the current patenting processes. Most importantly, and the key focus of this paper is the problem of patenting inventions that themselves are developed through machine learning. The inventor of a patent application must be a natural person or a group of persons according to the current legal situation in most patent law regimes. In order to be considered an 'inventor', a person must actually have developed part of the inventive concept. The mere application of machine learning or an AI algorithm to a particular problem should not be construed as the algorithm that contributes to a part of the inventive concept. However, when machine learning or the AI algorithm has contributed to a part of the inventive concept, there is currently a lack of clarity regarding the ownership of artificially created inventions. Since not only all European patent law regimes but also the Chinese and Singaporean patent law approaches include identical terms, this paper ultimately offers a comparative analysis of the most relevant patent law regimes.

Keywords: algorithms, inventor, genetic breeding models, machine learning, patentability

Procedia PDF Downloads 100
8331 Bridging the Gaping Levels of Information Entree for Visually Impaired Students in the Sri Lankan University Libraries

Authors: Wilfred Jeyatheese Jeyaraj

Abstract:

Education is a key determinant of future success, and every person deserves non-discriminant access to information for educational inevitabilities in any case. Analysing and understanding complex information is a crucial learning tool, especially for students. In order to compete equally with sighted students, visually impaired students require the unhinged access to access to all the available information resources. When the education of visually impaired students comes to a focal point, it can be stated that visually impaired students encounter several obstacles and barriers before they enter the university and during their time there as students. These obstacles and barriers are spread across technical, organizational and social arenas. This study reveals the possible approaches to absorb and benefit from the information provided by the Sri Lankan University Libraries for visually impaired students. Purposive sampling technique was used to select sample visually impaired students attached to the Sri Lankan National universities. There are 07 National universities which accommodate the visually impaired students and with the identified data, they were selected for this study and 80 visually impaired students were selected as the sample group. Descriptive type survey method was used to collect data. Structured questionnaires, interviews and direct observation were used as research instruments. As far as the Sri Lankan context spread is concerned, visually impaired students are able to finish their courses through their own determination to overcome the barriers they encounter on their way to graduation, through moral and practical support from their own friends and very often through a high level of creativity. According to the findings there are no specially trained university librarians to serve visually impaired users and less number of assistive technology equipment are available at present. This paper enables all university libraries in Sri Lanka to be informed about the social isolation of visually compromised students at the Sri Lankan universities and focuses on the rectification issues by considering their distinct case for interaction.

Keywords: information access, Sri Lanka, university libraries, visual impairment

Procedia PDF Downloads 223
8330 Visualization-Based Feature Extraction for Classification in Real-Time Interaction

Authors: Ágoston Nagy

Abstract:

This paper introduces a method of using unsupervised machine learning to visualize the feature space of a dataset in 2D, in order to find most characteristic segments in the set. After dimension reduction, users can select clusters by manual drawing. Selected clusters are recorded into a data model that is used for later predictions, based on realtime data. Predictions are made with supervised learning, using Gesture Recognition Toolkit. The paper introduces two example applications: a semantic audio organizer for analyzing incoming sounds, and a gesture database organizer where gestural data (recorded by a Leap motion) is visualized for further manipulation.

Keywords: gesture recognition, machine learning, real-time interaction, visualization

Procedia PDF Downloads 335
8329 Methods for Distinction of Cattle Using Supervised Learning

Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl

Abstract:

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning

Procedia PDF Downloads 535
8328 Enhancing Fault Detection in Rotating Machinery Using Wiener-CNN Method

Authors: Mohamad R. Moshtagh, Ahmad Bagheri

Abstract:

Accurate fault detection in rotating machinery is of utmost importance to ensure optimal performance and prevent costly downtime in industrial applications. This study presents a robust fault detection system based on vibration data collected from rotating gears under various operating conditions. The considered scenarios include: (1) both gears being healthy, (2) one healthy gear and one faulty gear, and (3) introducing an imbalanced condition to a healthy gear. Vibration data was acquired using a Hentek 1008 device and stored in a CSV file. Python code implemented in the Spider environment was used for data preprocessing and analysis. Winner features were extracted using the Wiener feature selection method. These features were then employed in multiple machine learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Random Forest, to evaluate their performance in detecting and classifying faults in both the training and validation datasets. The comparative analysis of the methods revealed the superior performance of the Wiener-CNN approach. The Wiener-CNN method achieved a remarkable accuracy of 100% for both the two-class (healthy gear and faulty gear) and three-class (healthy gear, faulty gear, and imbalanced) scenarios in the training and validation datasets. In contrast, the other methods exhibited varying levels of accuracy. The Wiener-MLP method attained 100% accuracy for the two-class training dataset and 100% for the validation dataset. For the three-class scenario, the Wiener-MLP method demonstrated 100% accuracy in the training dataset and 95.3% accuracy in the validation dataset. The Wiener-KNN method yielded 96.3% accuracy for the two-class training dataset and 94.5% for the validation dataset. In the three-class scenario, it achieved 85.3% accuracy in the training dataset and 77.2% in the validation dataset. The Wiener-Random Forest method achieved 100% accuracy for the two-class training dataset and 85% for the validation dataset, while in the three-class training dataset, it attained 100% accuracy and 90.8% accuracy for the validation dataset. The exceptional accuracy demonstrated by the Wiener-CNN method underscores its effectiveness in accurately identifying and classifying fault conditions in rotating machinery. The proposed fault detection system utilizes vibration data analysis and advanced machine learning techniques to improve operational reliability and productivity. By adopting the Wiener-CNN method, industrial systems can benefit from enhanced fault detection capabilities, facilitating proactive maintenance and reducing equipment downtime.

Keywords: fault detection, gearbox, machine learning, wiener method

Procedia PDF Downloads 64
8327 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 408
8326 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower, and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software we used in our study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: preprocessing of the data used, feature detection, and classification. We tried to determine the success of our study with different accuracy metrics and the results. We presented it comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: decision tree, water quality, water pollution, machine learning

Procedia PDF Downloads 72
8325 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 119
8324 Image Segmentation with Deep Learning of Prostate Cancer Bone Metastases on Computed Tomography

Authors: Joseph M. Rich, Vinay A. Duddalwar, Assad A. Oberai

Abstract:

Prostate adenocarcinoma is the most common cancer in males, with osseous metastases as the commonest site of metastatic prostate carcinoma (mPC). Treatment monitoring is based on the evaluation and characterization of lesions on multiple imaging studies, including Computed Tomography (CT). Monitoring of the osseous disease burden, including follow-up of lesions and identification and characterization of new lesions, is a laborious task for radiologists. Deep learning algorithms are increasingly used to perform tasks such as identification and segmentation for osseous metastatic disease and provide accurate information regarding metastatic burden. Here, nnUNet was used to produce a model which can segment CT scan images of prostate adenocarcinoma vertebral bone metastatic lesions. nnUNet is an open-source Python package that adds optimizations to deep learning-based UNet architecture but has not been extensively combined with transfer learning techniques due to the absence of a readily available functionality of this method. The IRB-approved study data set includes imaging studies from patients with mPC who were enrolled in clinical trials at the University of Southern California (USC) Health Science Campus and Los Angeles County (LAC)/USC medical center. Manual segmentation of metastatic lesions was completed by an expert radiologist Dr. Vinay Duddalwar (20+ years in radiology and oncologic imaging), to serve as ground truths for the automated segmentation. Despite nnUNet’s success on some medical segmentation tasks, it only produced an average Dice Similarity Coefficient (DSC) of 0.31 on the USC dataset. DSC results fell in a bimodal distribution, with most scores falling either over 0.66 (reasonably accurate) or at 0 (no lesion detected). Applying more aggressive data augmentation techniques dropped the DSC to 0.15, and reducing the number of epochs reduced the DSC to below 0.1. Datasets have been identified for transfer learning, which involve balancing between size and similarity of the dataset. Identified datasets include the Pancreas data from the Medical Segmentation Decathlon, Pelvic Reference Data, and CT volumes with multiple organ segmentations (CT-ORG). Some of the challenges of producing an accurate model from the USC dataset include small dataset size (115 images), 2D data (as nnUNet generally performs better on 3D data), and the limited amount of public data capturing annotated CT images of bone lesions. Optimizations and improvements will be made by applying transfer learning and generative methods, including incorporating generative adversarial networks and diffusion models in order to augment the dataset. Performance with different libraries, including MONAI and custom architectures with Pytorch, will be compared. In the future, molecular correlations will be tracked with radiologic features for the purpose of multimodal composite biomarker identification. Once validated, these models will be incorporated into evaluation workflows to optimize radiologist evaluation. Our work demonstrates the challenges of applying automated image segmentation to small medical datasets and lays a foundation for techniques to improve performance. As machine learning models become increasingly incorporated into the workflow of radiologists, these findings will help improve the speed and accuracy of vertebral metastatic lesions detection.

Keywords: deep learning, image segmentation, medicine, nnUNet, prostate carcinoma, radiomics

Procedia PDF Downloads 83
8323 Accelerating Quantum Chemistry Calculations: Machine Learning for Efficient Evaluation of Electron-Repulsion Integrals

Authors: Nishant Rodrigues, Nicole Spanedda, Chilukuri K. Mohan, Arindam Chakraborty

Abstract:

A crucial objective in quantum chemistry is the computation of the energy levels of chemical systems. This task requires electron-repulsion integrals as inputs, and the steep computational cost of evaluating these integrals poses a major numerical challenge in efficient implementation of quantum chemical software. This work presents a moment-based machine-learning approach for the efficient evaluation of electron-repulsion integrals. These integrals were approximated using linear combinations of a small number of moments. Machine learning algorithms were applied to estimate the coefficients in the linear combination. A random forest approach was used to identify promising features using a recursive feature elimination approach, which performed best for learning the sign of each coefficient but not the magnitude. A neural network with two hidden layers were then used to learn the coefficient magnitudes along with an iterative feature masking approach to perform input vector compression, identifying a small subset of orbitals whose coefficients are sufficient for the quantum state energy computation. Finally, a small ensemble of neural networks (with a median rule for decision fusion) was shown to improve results when compared to a single network.

Keywords: quantum energy calculations, atomic orbitals, electron-repulsion integrals, ensemble machine learning, random forests, neural networks, feature extraction

Procedia PDF Downloads 95
8322 Advancing Urban Sustainability through Data-Driven Machine Learning Solutions

Authors: Nasim Eslamirad, Mahdi Rasoulinezhad, Francesco De Luca, Sadok Ben Yahia, Kimmo Sakari Lylykangas, Francesco Pilla

Abstract:

With the ongoing urbanization, cities face increasing environmental challenges impacting human well-being. To tackle these issues, data-driven approaches in urban analysis have gained prominence, leveraging urban data to promote sustainability. Integrating Machine Learning techniques enables researchers to analyze and predict complex environmental phenomena like Urban Heat Island occurrences in urban areas. This paper demonstrates the implementation of data-driven approach and interpretable Machine Learning algorithms with interpretability techniques to conduct comprehensive data analyses for sustainable urban design. The developed framework and algorithms are demonstrated for Tallinn, Estonia to develop sustainable urban strategies to mitigate urban heat waves. Geospatial data, preprocessed and labeled with UHI levels, are used to train various ML models, with Logistic Regression emerging as the best-performing model based on evaluation metrics to derive a mathematical equation representing the area with UHI or without UHI effects, providing insights into UHI occurrences based on buildings and urban features. The derived formula highlights the importance of building volume, height, area, and shape length to create an urban environment with UHI impact. The data-driven approach and derived equation inform mitigation strategies and sustainable urban development in Tallinn and offer valuable guidance for other locations with varying climates.

Keywords: data-driven approach, machine learning transparent models, interpretable machine learning models, urban heat island effect

Procedia PDF Downloads 18
8321 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects

Authors: Victor Radich, Tania Basso, Regina Moraes

Abstract:

Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.

Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring

Procedia PDF Downloads 65
8320 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia

Authors: The Danh Phan

Abstract:

House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.

Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise

Procedia PDF Downloads 210
8319 Copyright Clearance for Artificial Intelligence Training Data: Challenges and Solutions

Authors: Erva Akin

Abstract:

– The use of copyrighted material for machine learning purposes is a challenging issue in the field of artificial intelligence (AI). While machine learning algorithms require large amounts of data to train and improve their accuracy and creativity, the use of copyrighted material without permission from the authors may infringe on their intellectual property rights. In order to overcome copyright legal hurdle against the data sharing, access and re-use of data, the use of copyrighted material for machine learning purposes may be considered permissible under certain circumstances. For example, if the copyright holder has given permission to use the data through a licensing agreement, then the use for machine learning purposes may be lawful. It is also argued that copying for non-expressive purposes that do not involve conveying expressive elements to the public, such as automated data extraction, should not be seen as infringing. The focus of such ‘copy-reliant technologies’ is on understanding language rules, styles, and syntax and no creative ideas are being used. However, the non-expressive use defense is within the framework of the fair use doctrine, which allows the use of copyrighted material for research or educational purposes. The questions arise because the fair use doctrine is not available in EU law, instead, the InfoSoc Directive provides for a rigid system of exclusive rights with a list of exceptions and limitations. One could only argue that non-expressive uses of copyrighted material for machine learning purposes do not constitute a ‘reproduction’ in the first place. Nevertheless, the use of machine learning with copyrighted material is difficult because EU copyright law applies to the mere use of the works. Two solutions can be proposed to address the problem of copyright clearance for AI training data. The first is to introduce a broad exception for text and data mining, either mandatorily or for commercial and scientific purposes, or to permit the reproduction of works for non-expressive purposes. The second is that copyright laws should permit the reproduction of works for non-expressive purposes, which opens the door to discussions regarding the transposition of the fair use principle from the US into EU law. Both solutions aim to provide more space for AI developers to operate and encourage greater freedom, which could lead to more rapid innovation in the field. The Data Governance Act presents a significant opportunity to advance these debates. Finally, issues concerning the balance of general public interests and legitimate private interests in machine learning training data must be addressed. In my opinion, it is crucial that robot-creation output should fall into the public domain. Machines depend on human creativity, innovation, and expression. To encourage technological advancement and innovation, freedom of expression and business operation must be prioritised.

Keywords: artificial intelligence, copyright, data governance, machine learning

Procedia PDF Downloads 73
8318 New Machine Learning Optimization Approach Based on Input Variables Disposition Applied for Time Series Prediction

Authors: Hervice Roméo Fogno Fotsoa, Germaine Djuidje Kenmoe, Claude Vidal Aloyem Kazé

Abstract:

One of the main applications of machine learning is the prediction of time series. But a more accurate prediction requires a more optimal model of machine learning. Several optimization techniques have been developed, but without considering the input variables disposition of the system. Thus, this work aims to present a new machine learning architecture optimization technique based on their optimal input variables disposition. The validations are done on the prediction of wind time series, using data collected in Cameroon. The number of possible dispositions with four input variables is determined, i.e., twenty-four. Each of the dispositions is used to perform the prediction, with the main criteria being the training and prediction performances. The results obtained from a static architecture and a dynamic architecture of neural networks have shown that these performances are a function of the input variable's disposition, and this is in a different way from the architectures. This analysis revealed that it is necessary to take into account the input variable's disposition for the development of a more optimal neural network model. Thus, a new neural network training algorithm is proposed by introducing the search for the optimal input variables disposition in the traditional back-propagation algorithm. The results of the application of this new optimization approach on the two single neural network architectures are compared with the previously obtained results step by step. Moreover, this proposed approach is validated in a collaborative optimization method with a single objective optimization technique, i.e., genetic algorithm back-propagation neural networks. From these comparisons, it is concluded that each proposed model outperforms its traditional model in terms of training and prediction performance of time series. Thus the proposed optimization approach can be useful in improving the accuracy of time series forecasts. This proves that the proposed optimization approach can be useful in improving the accuracy of time series prediction based on machine learning.

Keywords: input variable disposition, machine learning, optimization, performance, time series prediction

Procedia PDF Downloads 89
8317 Comparison of Different Machine Learning Models for Time-Series Based Load Forecasting of Electric Vehicle Charging Stations

Authors: H. J. Joshi, Satyajeet Patil, Parth Dandavate, Mihir Kulkarni, Harshita Agrawal

Abstract:

As the world looks towards a sustainable future, electric vehicles have become increasingly popular. Millions worldwide are looking to switch to Electric cars over the previously favored combustion engine-powered cars. This demand has seen an increase in Electric Vehicle Charging Stations. The big challenge is that the randomness of electrical energy makes it tough for these charging stations to provide an adequate amount of energy over a specific amount of time. Thus, it has become increasingly crucial to model these patterns and forecast the energy needs of power stations. This paper aims to analyze how different machine learning models perform on Electric Vehicle charging time-series data. The data set consists of authentic Electric Vehicle Data from the Netherlands. It has an overview of ten thousand transactions from public stations operated by EVnetNL.

Keywords: forecasting, smart grid, electric vehicle load forecasting, machine learning, time series forecasting

Procedia PDF Downloads 90
8316 An Overview of Domain Models of Urban Quantitative Analysis

Authors: Mohan Li

Abstract:

Nowadays, intelligent research technology is more and more important than traditional research methods in urban research work, and this proportion will greatly increase in the next few decades. Frequently such analyzing work cannot be carried without some software engineering knowledge. And here, domain models of urban research will be necessary when applying software engineering knowledge to urban work. In many urban plan practice projects, making rational models, feeding reliable data, and providing enough computation all make indispensable assistance in producing good urban planning. During the whole work process, domain models can optimize workflow design. At present, human beings have entered the era of big data. The amount of digital data generated by cities every day will increase at an exponential rate, and new data forms are constantly emerging. How to select a suitable data set from the massive amount of data, manage and process it has become an ability that more and more planners and urban researchers need to possess. This paper summarizes and makes predictions of the emergence of technologies and technological iterations that may affect urban research in the future, discover urban problems, and implement targeted sustainable urban strategies. They are summarized into seven major domain models. They are urban and rural regional domain model, urban ecological domain model, urban industry domain model, development dynamic domain model, urban social and cultural domain model, urban traffic domain model, and urban space domain model. These seven domain models can be used to guide the construction of systematic urban research topics and help researchers organize a series of intelligent analytical tools, such as Python, R, GIS, etc. These seven models make full use of quantitative spatial analysis, machine learning, and other technologies to achieve higher efficiency and accuracy in urban research, assisting people in making reasonable decisions.

Keywords: big data, domain model, urban planning, urban quantitative analysis, machine learning, workflow design

Procedia PDF Downloads 164
8315 ANOVA-Based Feature Selection and Machine Learning System for IoT Anomaly Detection

Authors: Muhammad Ali

Abstract:

Cyber-attacks and anomaly detection on the Internet of Things (IoT) infrastructure is emerging concern in the domain of data-driven intrusion. Rapidly increasing IoT risk is now making headlines around the world. denial of service, malicious control, data type probing, malicious operation, DDos, scan, spying, and wrong setup are attacks and anomalies that can affect an IoT system failure. Everyone talks about cyber security, connectivity, smart devices, and real-time data extraction. IoT devices expose a wide variety of new cyber security attack vectors in network traffic. For further than IoT development, and mainly for smart and IoT applications, there is a necessity for intelligent processing and analysis of data. So, our approach is too secure. We train several machine learning models that have been compared to accurately predicting attacks and anomalies on IoT systems, considering IoT applications, with ANOVA-based feature selection with fewer prediction models to evaluate network traffic to help prevent IoT devices. The machine learning (ML) algorithms that have been used here are KNN, SVM, NB, D.T., and R.F., with the most satisfactory test accuracy with fast detection. The evaluation of ML metrics includes precision, recall, F1 score, FPR, NPV, G.M., MCC, and AUC & ROC. The Random Forest algorithm achieved the best results with less prediction time, with an accuracy of 99.98%.

Keywords: machine learning, analysis of variance, Internet of Thing, network security, intrusion detection

Procedia PDF Downloads 105
8314 Employing QR Code as an Effective Educational Tool for Quick Access to Sources of Kindergarten Concepts

Authors: Ahmed Amin Mousa, M. Abd El-Salam

Abstract:

This study discusses a simple solution for the problem of shortage in learning resources for kindergarten teachers. Occasionally, kindergarten teachers cannot access proper resources by usual search methods as libraries or search engines. Furthermore, these methods require a long time and efforts for preparing. The study is expected to facilitate accessing learning resources. Moreover, it suggests a potential direction for using QR code inside the classroom. The present work proposes that QR code can be used for digitizing kindergarten curriculums and accessing various learning resources. It investigates using QR code for saving information related to the concepts which kindergarten teachers use in the current educational situation. The researchers have established a guide for kindergarten teachers based on the Egyptian official curriculum. The guide provides different learning resources for each scientific and mathematical concept in the curriculum, and each learning resource is represented as a QR code image that contains its URL. Therefore, kindergarten teachers can use smartphone applications for reading QR codes and displaying the related learning resources for students immediately. The guide has been provided to a group of 108 teachers for using inside their classrooms. The results showed that the teachers approved the guide, and gave a good response.

Keywords: kindergarten, child, learning resources, QR code, smart phone, mobile

Procedia PDF Downloads 276
8313 FlexPoints: Efficient Algorithm for Detection of Electrocardiogram Characteristic Points

Authors: Daniel Bulanda, Janusz A. Starzyk, Adrian Horzyk

Abstract:

The electrocardiogram (ECG) is one of the most commonly used medical tests, essential for correct diagnosis and treatment of the patient. While ECG devices generate a huge amount of data, only a small part of them carries valuable medical information. To deal with this problem, many compression algorithms and filters have been developed over the past years. However, the rapid development of new machine learning techniques poses new challenges. To address this class of problems, we created the FlexPoints algorithm that searches for characteristic points on the ECG signal and ignores all other points that do not carry relevant medical information. The conducted experiments proved that the presented algorithm can significantly reduce the number of data points which represents ECG signal without losing valuable medical information. These sparse but essential characteristic points (flex points) can be a perfect input for some modern machine learning models, which works much better using flex points as an input instead of raw data or data compressed by many popular algorithms.

Keywords: characteristic points, electrocardiogram, ECG, machine learning, signal compression

Procedia PDF Downloads 151
8312 WebAppShield: An Approach Exploiting Machine Learning to Detect SQLi Attacks in an Application Layer in Run-time

Authors: Ahmed Abdulla Ashlam, Atta Badii, Frederic Stahl

Abstract:

In recent years, SQL injection attacks have been identified as being prevalent against web applications. They affect network security and user data, which leads to a considerable loss of money and data every year. This paper presents the use of classification algorithms in machine learning using a method to classify the login data filtering inputs into "SQLi" or "Non-SQLi,” thus increasing the reliability and accuracy of results in terms of deciding whether an operation is an attack or a valid operation. A method Web-App auto-generated twin data structure replication. Shielding against SQLi attacks (WebAppShield) that verifies all users and prevents attackers (SQLi attacks) from entering and or accessing the database, which the machine learning module predicts as "Non-SQLi" has been developed. A special login form has been developed with a special instance of data validation; this verification process secures the web application from its early stages. The system has been tested and validated, up to 99% of SQLi attacks have been prevented.

Keywords: SQL injection, attacks, web application, accuracy, database

Procedia PDF Downloads 132
8311 A Convolutional Neural Network Based Vehicle Theft Detection, Location, and Reporting System

Authors: Michael Moeti, Khuliso Sigama, Thapelo Samuel Matlala

Abstract:

One of the principal challenges that the world is confronted with is insecurity. The crime rate is increasing exponentially, and protecting our physical assets especially in the motorist industry, is becoming impossible when applying our own strength. The need to develop technological solutions that detect and report theft without any human interference is inevitable. This is critical, especially for vehicle owners, to ensure theft detection and speedy identification towards recovery efforts in cases where a vehicle is missing or attempted theft is taking place. The vehicle theft detection system uses Convolutional Neural Network (CNN) to recognize the driver's face captured using an installed mobile phone device. The location identification function uses a Global Positioning System (GPS) to determine the real-time location of the vehicle. Upon identification of the location, Global System for Mobile Communications (GSM) technology is used to report or notify the vehicle owner about the whereabouts of the vehicle. The installed mobile app was implemented by making use of python as it is undoubtedly the best choice in machine learning. It allows easy access to machine learning algorithms through its widely developed library ecosystem. The graphical user interface was developed by making use of JAVA as it is better suited for mobile development. Google's online database (Firebase) was used as a means of storage for the application. The system integration test was performed using a simple percentage analysis. Sixty (60) vehicle owners participated in this study as a sample, and questionnaires were used in order to establish the acceptability of the system developed. The result indicates the efficiency of the proposed system, and consequently, the paper proposes the use of the system can effectively monitor the vehicle at any given place, even if it is driven outside its normal jurisdiction. More so, the system can be used as a database to detect, locate and report missing vehicles to different security agencies.

Keywords: CNN, location identification, tracking, GPS, GSM

Procedia PDF Downloads 145
8310 Automatic Fluid-Structure Interaction Modeling and Analysis of Butterfly Valve Using Python Script

Authors: N. Guru Prasath, Sangjin Ma, Chang-Wan Kim

Abstract:

A butterfly valve is a quarter turn valve which is used to control the flow of a fluid through a section of pipe. Generally, butterfly valve is used in wide range of applications such as water distribution, sewage, oil and gas plants. In particular, butterfly valve with larger diameter finds its immense applications in hydro power plants to control the fluid flow. In-lieu with the constraints in cost and size to run laboratory setup, analysis of large diameter values will be mostly studied by computational method which is the best and inexpensive solution. For fluid and structural analysis, CFD and FEM software is used to perform large scale valve analyses, respectively. In order to perform above analysis in butterfly valve, the CAD model has to recreate and perform mesh in conventional software’s for various dimensions of valve. Therefore, its limitation is time consuming process. In-order to overcome that issue, python code was created to outcome complete pre-processing setup automatically in Salome software. Applying dimensions of the model clearly in the python code makes the running time comparatively lower and easier way to perform analysis of the valve. Hence, in this paper, an attempt was made to study the fluid-structure interaction (FSI) of butterfly valves by varying the valve angles and dimensions using python code in pre-processing software, and results are produced.

Keywords: butterfly valve, flow coefficient, automatic CFD analysis, FSI analysis

Procedia PDF Downloads 230