Search results for: explainable machine learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8600

Search results for: explainable machine learning

8390 Stock Market Prediction Using Convolutional Neural Network That Learns from a Graph

Authors: Mo-Se Lee, Cheol-Hwi Ahn, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN (Convolutional Neural Network), which is known as effective solution for recognizing and classifying images, has been popularly applied to classification and prediction problems in various fields. In this study, we try to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. In specific, we propose to apply CNN as the binary classifier that predicts stock market direction (up or down) by using a graph as its input. That is, our proposal is to build a machine learning algorithm that mimics a person who looks at the graph and predicts whether the trend will go up or down. Our proposed model consists of four steps. In the first step, it divides the dataset into 5 days, 10 days, 15 days, and 20 days. And then, it creates graphs for each interval in step 2. In the next step, CNN classifiers are trained using the graphs generated in the previous step. In step 4, it optimizes the hyper parameters of the trained model by using the validation dataset. To validate our model, we will apply it to the prediction of KOSPI200 for 1,986 days in eight years (from 2009 to 2016). The experimental dataset will include 14 technical indicators such as CCI, Momentum, ROC and daily closing price of KOSPI200 of Korean stock market.

Keywords: convolutional neural network, deep learning, Korean stock market, stock market prediction

Procedia PDF Downloads 426
8389 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder

Authors: Dua Hişam, Serhat İkizoğlu

Abstract:

Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.

Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting

Procedia PDF Downloads 72
8388 Enhancing Sell-In and Sell-Out Forecasting Using Ensemble Machine Learning Method

Authors: Vishal Das, Tianyi Mao, Zhicheng Geng, Carmen Flores, Diego Pelloso, Fang Wang

Abstract:

Accurate sell-in and sell-out forecasting is a ubiquitous problem in the retail industry. It is an important element of any demand planning activity. As a global food and beverage company, Nestlé has hundreds of products in each geographical location that they operate in. Each product has its sell-in and sell-out time series data, which are forecasted on a weekly and monthly scale for demand and financial planning. To address this challenge, Nestlé Chilein collaboration with Amazon Machine Learning Solutions Labhas developed their in-house solution of using machine learning models for forecasting. Similar products are combined together such that there is one model for each product category. In this way, the models learn from a larger set of data, and there are fewer models to maintain. The solution is scalable to all product categories and is developed to be flexible enough to include any new product or eliminate any existing product in a product category based on requirements. We show how we can use the machine learning development environment on Amazon Web Services (AWS) to explore a set of forecasting models and create business intelligence dashboards that can be used with the existing demand planning tools in Nestlé. We explored recent deep learning networks (DNN), which show promising results for a variety of time series forecasting problems. Specifically, we used a DeepAR autoregressive model that can group similar time series together and provide robust predictions. To further enhance the accuracy of the predictions and include domain-specific knowledge, we designed an ensemble approach using DeepAR and XGBoost regression model. As part of the ensemble approach, we interlinked the sell-out and sell-in information to ensure that a future sell-out influences the current sell-in predictions. Our approach outperforms the benchmark statistical models by more than 50%. The machine learning (ML) pipeline implemented in the cloud is currently being extended for other product categories and is getting adopted by other geomarkets.

Keywords: sell-in and sell-out forecasting, demand planning, DeepAR, retail, ensemble machine learning, time-series

Procedia PDF Downloads 276
8387 Automated Detection of Women Dehumanization in English Text

Authors: Maha Wiss, Wael Khreich

Abstract:

Animals, objects, foods, plants, and other non-human terms are commonly used as a source of metaphors to describe females in formal and slang language. Comparing women to non-human items not only reflects cultural views that might conceptualize women as subordinates or in a lower position than humans, yet it conveys this degradation to the listeners. Moreover, the dehumanizing representation of females in the language normalizes the derogation and even encourages sexism and aggressiveness against women. Although dehumanization has been a popular research topic for decades, according to our knowledge, no studies have linked women's dehumanizing language to the machine learning field. Therefore, we introduce our research work as one of the first attempts to create a tool for the automated detection of the dehumanizing depiction of females in English texts. We also present the first labeled dataset on the charted topic, which is used for training supervised machine learning algorithms to build an accurate classification model. The importance of this work is that it accomplishes the first step toward mitigating dehumanizing language against females.

Keywords: gender bias, machine learning, NLP, women dehumanization

Procedia PDF Downloads 81
8386 Analyzing the Performance of Machine Learning Models to Predict Alzheimer's Disease and its Stages Addressing Missing Value Problem

Authors: Carlos Theran, Yohn Parra Bautista, Victor Adankai, Richard Alo, Jimwi Liu, Clement G. Yedjou

Abstract:

Alzheimer's disease (AD) is a neurodegenerative disorder primarily characterized by deteriorating cognitive functions. AD has gained relevant attention in the last decade. An estimated 24 million people worldwide suffered from this disease by 2011. In 2016 an estimated 40 million were diagnosed with AD, and for 2050 is expected to reach 131 million people affected by AD. Therefore, detecting and confirming AD at its different stages is a priority for medical practices to provide adequate and accurate treatments. Recently, Machine Learning (ML) models have been used to study AD's stages handling missing values in multiclass, focusing on the delineation of Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI), and normal cognitive (CN). But, to our best knowledge, robust performance information of these models and the missing data analysis has not been presented in the literature. In this paper, we propose studying the performance of five different machine learning models for AD's stages multiclass prediction in terms of accuracy, precision, and F1-score. Also, the analysis of three imputation methods to handle the missing value problem is presented. A framework that integrates ML model for AD's stages multiclass prediction is proposed, performing an average accuracy of 84%.

Keywords: alzheimer's disease, missing value, machine learning, performance evaluation

Procedia PDF Downloads 255
8385 Classification of IoT Traffic Security Attacks Using Deep Learning

Authors: Anum Ali, Kashaf ad Dooja, Asif Saleem

Abstract:

The future smart cities trend will be towards Internet of Things (IoT); IoT creates dynamic connections in a ubiquitous manner. Smart cities offer ease and flexibility for daily life matters. By using small devices that are connected to cloud servers based on IoT, network traffic between these devices is growing exponentially, whose security is a concerned issue, since ratio of cyber attack may make the network traffic vulnerable. This paper discusses the latest machine learning approaches in related work further to tackle the increasing rate of cyber attacks, machine learning algorithm is applied to IoT-based network traffic data. The proposed algorithm train itself on data and identify different sections of devices interaction by using supervised learning which is considered as a classifier related to a specific IoT device class. The simulation results clearly identify the attacks and produce fewer false detections.

Keywords: IoT, traffic security, deep learning, classification

Procedia PDF Downloads 154
8384 Forecasting the Future Implications of ChatGPT Usage in Education Based on AI Algorithms

Authors: Yakubu Bala Mohammed, Nadire Chavus, Mohammed Bulama

Abstract:

Generative Pre-trained Transformer (ChatGPT) represents an artificial intelligence (AI) tool capable of swiftly generating comprehensive responses to prompts and follow-up inquiries. This emerging AI tool was introduced in November 2022 by OpenAI firm, an American AI research laboratory, utilizing substantial language models. This present study aims to delve into the potential future consequences of ChatGPT usage in education using AI-based algorithms. The paper will bring forth the likely potential risks of ChatGBT utilization, such as academic integrity concerns, unfair learning assessments, excessive reliance on AI, and dissemination of inaccurate information using four machine learning algorithms: eXtreme-Gradient Boosting (XGBoost), Support vector machine (SVM), Emotional artificial neural network (EANN), and Random forest (RF) would be used to analyze the study collected data due to their robustness. Finally, the findings of the study will assist education stakeholders in understanding the future implications of ChatGPT usage in education and propose solutions and directions for upcoming studies.

Keywords: machine learning, ChatGPT, education, learning, implications

Procedia PDF Downloads 236
8383 Regression Model Evaluation on Depth Camera Data for Gaze Estimation

Authors: James Purnama, Riri Fitri Sari

Abstract:

We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.

Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python

Procedia PDF Downloads 539
8382 Innovative Predictive Modeling and Characterization of Composite Material Properties Using Machine Learning and Genetic Algorithms

Authors: Hamdi Beji, Toufik Kanit, Tanguy Messager

Abstract:

This study aims to construct a predictive model proficient in foreseeing the linear elastic and thermal characteristics of composite materials, drawing on a multitude of influencing parameters. These parameters encompass the shape of inclusions (circular, elliptical, square, triangle), their spatial coordinates within the matrix, orientation, volume fraction (ranging from 0.05 to 0.4), and variations in contrast (spanning from 10 to 200). A variety of machine learning techniques are deployed, including decision trees, random forests, support vector machines, k-nearest neighbors, and an artificial neural network (ANN), to facilitate this predictive model. Moreover, this research goes beyond the predictive aspect by delving into an inverse analysis using genetic algorithms. The intent is to unveil the intrinsic characteristics of composite materials by evaluating their thermomechanical responses. The foundation of this research lies in the establishment of a comprehensive database that accounts for the array of input parameters mentioned earlier. This database, enriched with this diversity of input variables, serves as a bedrock for the creation of machine learning and genetic algorithm-based models. These models are meticulously trained to not only predict but also elucidate the mechanical and thermal conduct of composite materials. Remarkably, the coupling of machine learning and genetic algorithms has proven highly effective, yielding predictions with remarkable accuracy, boasting scores ranging between 0.97 and 0.99. This achievement marks a significant breakthrough, demonstrating the potential of this innovative approach in the field of materials engineering.

Keywords: machine learning, composite materials, genetic algorithms, mechanical and thermal proprieties

Procedia PDF Downloads 55
8381 A Machine Learning Based Method to Detect System Failure in Resource Constrained Environment

Authors: Payel Datta, Abhishek Das, Abhishek Roychoudhury, Dhiman Chattopadhyay, Tanushyam Chattopadhyay

Abstract:

Machine learning (ML) and deep learning (DL) is most predominantly used in image/video processing, natural language processing (NLP), audio and speech recognition but not that much used in system performance evaluation. In this paper, authors are going to describe the architecture of an abstraction layer constructed using ML/DL to detect the system failure. This proposed system is used to detect the system failure by evaluating the performance metrics of an IoT service deployment under constrained infrastructure environment. This system has been tested on the manually annotated data set containing different metrics of the system, like number of threads, throughput, average response time, CPU usage, memory usage, network input/output captured in different hardware environments like edge (atom based gateway) and cloud (AWS EC2). The main challenge of developing such system is that the accuracy of classification should be 100% as the error in the system has an impact on the degradation of the service performance and thus consequently affect the reliability and high availability which is mandatory for an IoT system. Proposed ML/DL classifiers work with 100% accuracy for the data set of nearly 4,000 samples captured within the organization.

Keywords: machine learning, system performance, performance metrics, IoT, edge

Procedia PDF Downloads 196
8380 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 130
8379 Machine Learning for Targeting of Conditional Cash Transfers: Improving the Effectiveness of Proxy Means Tests to Identify Future School Dropouts and the Poor

Authors: Cristian Crespo

Abstract:

Conditional cash transfers (CCTs) have been targeted towards the poor. Thus, their targeting assessments check whether these schemes have been allocated to low-income households or individuals. However, CCTs have more than one goal and target group. An additional goal of CCTs is to increase school enrolment. Hence, students at risk of dropping out of school also are a target group. This paper analyses whether one of the most common targeting mechanisms of CCTs, a proxy means test (PMT), is suitable to identify the poor and future school dropouts. The PMT is compared with alternative approaches that use the outputs of a predictive model of school dropout. This model was built using machine learning algorithms and rich administrative datasets from Chile. The paper shows that using machine learning outputs in conjunction with the PMT increases targeting effectiveness by identifying more students who are either poor or future dropouts. This joint targeting approach increases effectiveness in different scenarios except when the social valuation of the two target groups largely differs. In these cases, the most likely optimal approach is to solely adopt the targeting mechanism designed to find the highly valued group.

Keywords: conditional cash transfers, machine learning, poverty, proxy means tests, school dropout prediction, targeting

Procedia PDF Downloads 206
8378 Convolutional Neural Networks versus Radiomic Analysis for Classification of Breast Mammogram

Authors: Mehwish Asghar

Abstract:

Breast Cancer (BC) is a common type of cancer among women. Its screening is usually performed using different imaging modalities such as magnetic resonance imaging, mammogram, X-ray, CT, etc. Among these modalities’ mammogram is considered a powerful tool for diagnosis and screening of breast cancer. Sophisticated machine learning approaches have shown promising results in complementing human diagnosis. Generally, machine learning methods can be divided into two major classes: one is Radiomics analysis (RA), where image features are extracted manually; and the other one is the concept of convolutional neural networks (CNN), in which the computer learns to recognize image features on its own. This research aims to improve the incidence of early detection, thus reducing the mortality rate caused by breast cancer through the latest advancements in computer science, in general, and machine learning, in particular. It has also been aimed to ease the burden of doctors by improving and automating the process of breast cancer detection. This research is related to a relative analysis of different techniques for the implementation of different models for detecting and classifying breast cancer. The main goal of this research is to provide a detailed view of results and performances between different techniques. The purpose of this paper is to explore the potential of a convolutional neural network (CNN) w.r.t feature extractor and as a classifier. Also, in this research, it has been aimed to add the module of Radiomics for comparison of its results with deep learning techniques.

Keywords: breast cancer (BC), machine learning (ML), convolutional neural network (CNN), radionics, magnetic resonance imaging, artificial intelligence

Procedia PDF Downloads 228
8377 Computational Intelligence and Machine Learning for Urban Drainage Infrastructure Asset Management

Authors: Thewodros K. Geberemariam

Abstract:

The rapid physical expansion of urbanization coupled with aging infrastructure presents a unique decision and management challenges for many big city municipalities. Cities must therefore upgrade and maintain the existing aging urban drainage infrastructure systems to keep up with the demands. Given the overall contribution of assets to municipal revenue and the importance of infrastructure to the success of a livable city, many municipalities are currently looking for a robust and smart urban drainage infrastructure asset management solution that combines management, financial, engineering and technical practices. This robust decision-making shall rely on sound, complete, current and relevant data that enables asset valuation, impairment testing, lifecycle modeling, and forecasting across the multiple asset portfolios. On this paper, predictive computational intelligence (CI) and multi-class machine learning (ML) coupled with online, offline, and historical record data that are collected from an array of multi-parameter sensors are used for the extraction of different operational and non-conforming patterns hidden in structured and unstructured data to determine and produce actionable insight on the current and future states of the network. This paper aims to improve the strategic decision-making process by identifying all possible alternatives; evaluate the risk of each alternative, and choose the alternative most likely to attain the required goal in a cost-effective manner using historical and near real-time urban drainage infrastructure data for urban drainage infrastructures assets that have previously not benefited from computational intelligence and machine learning advancements.

Keywords: computational intelligence, machine learning, urban drainage infrastructure, machine learning, classification, prediction, asset management space

Procedia PDF Downloads 153
8376 Data Modeling and Calibration of In-Line Pultrusion and Laser Ablation Machine Processes

Authors: David F. Nettleton, Christian Wasiak, Jonas Dorissen, David Gillen, Alexandr Tretyak, Elodie Bugnicourt, Alejandro Rosales

Abstract:

In this work, preliminary results are given for the modeling and calibration of two inline processes, pultrusion, and laser ablation, using machine learning techniques. The end product of the processes is the core of a medical guidewire, manufactured to comply with a user specification of diameter and flexibility. An ensemble approach is followed which requires training several models. Two state of the art machine learning algorithms are benchmarked: Kernel Recursive Least Squares (KRLS) and Support Vector Regression (SVR). The final objective is to build a precise digital model of the pultrusion and laser ablation process in order to calibrate the resulting diameter and flexibility of a medical guidewire, which is the end product while taking into account the friction on the forming die. The result is an ensemble of models, whose output is within a strict required tolerance and which covers the required range of diameter and flexibility of the guidewire end product. The modeling and automatic calibration of complex in-line industrial processes is a key aspect of the Industry 4.0 movement for cyber-physical systems.

Keywords: calibration, data modeling, industrial processes, machine learning

Procedia PDF Downloads 302
8375 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 437
8374 Machine Learning in Patent Law: How Genetic Breeding Algorithms Challenge Modern Patent Law Regimes

Authors: Stefan Papastefanou

Abstract:

Artificial intelligence (AI) is an interdisciplinary field of computer science with the aim of creating intelligent machine behavior. Early approaches to AI have been configured to operate in very constrained environments where the behavior of the AI system was previously determined by formal rules. Knowledge was presented as a set of rules that allowed the AI system to determine the results for specific problems; as a structure of if-else rules that could be traversed to find a solution to a particular problem or question. However, such rule-based systems typically have not been able to generalize beyond the knowledge provided. All over the world and especially in IT-heavy industries such as the United States, the European Union, Singapore, and China, machine learning has developed to be an immense asset, and its applications are becoming more and more significant. It has to be examined how such products of machine learning models can and should be protected by IP law and for the purpose of this paper patent law specifically, since it is the IP law regime closest to technical inventions and computing methods in technical applications. Genetic breeding models are currently less popular than recursive neural network method and deep learning, but this approach can be more easily described by referring to the evolution of natural organisms, and with increasing computational power; the genetic breeding method as a subset of the evolutionary algorithms models is expected to be regaining popularity. The research method focuses on patentability (according to the world’s most significant patent law regimes such as China, Singapore, the European Union, and the United States) of AI inventions and machine learning. Questions of the technical nature of the problem to be solved, the inventive step as such, and the question of the state of the art and the associated obviousness of the solution arise in the current patenting processes. Most importantly, and the key focus of this paper is the problem of patenting inventions that themselves are developed through machine learning. The inventor of a patent application must be a natural person or a group of persons according to the current legal situation in most patent law regimes. In order to be considered an 'inventor', a person must actually have developed part of the inventive concept. The mere application of machine learning or an AI algorithm to a particular problem should not be construed as the algorithm that contributes to a part of the inventive concept. However, when machine learning or the AI algorithm has contributed to a part of the inventive concept, there is currently a lack of clarity regarding the ownership of artificially created inventions. Since not only all European patent law regimes but also the Chinese and Singaporean patent law approaches include identical terms, this paper ultimately offers a comparative analysis of the most relevant patent law regimes.

Keywords: algorithms, inventor, genetic breeding models, machine learning, patentability

Procedia PDF Downloads 110
8373 Intelligent Fault Diagnosis for the Connection Elements of Modular Offshore Platforms

Authors: Jixiang Lei, Alexander Fuchs, Franz Pernkopf, Katrin Ellermann

Abstract:

Within the Space@Sea project, funded by the Horizon 2020 program, an island consisting of multiple platforms was designed. The platforms are connected by ropes and fenders. The connection is critical with respect to the safety of the whole system. Therefore, fault detection systems are investigated, which could detect early warning signs for a possible failure in the connection elements. Previously, a model-based method called Extended Kalman Filter was developed to detect the reduction of rope stiffness. This method detected several types of faults reliably, but some types of faults were much more difficult to detect. Furthermore, the model-based method is sensitive to environmental noise. When the wave height is low, a long time is needed to detect a fault and the accuracy is not always satisfactory. In this sense, it is necessary to develop a more accurate and robust technique that can detect all rope faults under a wide range of operational conditions. Inspired by this work on the Space at Sea design, we introduce a fault diagnosis method based on deep neural networks. Our method cannot only detect rope degradation by using the acceleration data from each platform but also estimate the contributions of the specific acceleration sensors using methods from explainable AI. In order to adapt to different operational conditions, the domain adaptation technique DANN is applied. The proposed model can accurately estimate rope degradation under a wide range of environmental conditions and help users understand the relationship between the output and the contributions of each acceleration sensor.

Keywords: fault diagnosis, deep learning, domain adaptation, explainable AI

Procedia PDF Downloads 183
8372 Visualization-Based Feature Extraction for Classification in Real-Time Interaction

Authors: Ágoston Nagy

Abstract:

This paper introduces a method of using unsupervised machine learning to visualize the feature space of a dataset in 2D, in order to find most characteristic segments in the set. After dimension reduction, users can select clusters by manual drawing. Selected clusters are recorded into a data model that is used for later predictions, based on realtime data. Predictions are made with supervised learning, using Gesture Recognition Toolkit. The paper introduces two example applications: a semantic audio organizer for analyzing incoming sounds, and a gesture database organizer where gestural data (recorded by a Leap motion) is visualized for further manipulation.

Keywords: gesture recognition, machine learning, real-time interaction, visualization

Procedia PDF Downloads 355
8371 Optimizing Machine Learning Through Python Based Image Processing Techniques

Authors: Srinidhi. A, Naveed Ahmed, Twinkle Hareendran, Vriksha Prakash

Abstract:

This work reviews some of the advanced image processing techniques for deep learning applications. Object detection by template matching, image denoising, edge detection, and super-resolution modelling are but a few of the tasks. The paper looks in into great detail, given that such tasks are crucial preprocessing steps that increase the quality and usability of image datasets in subsequent deep learning tasks. We review some of the methods for the assessment of image quality, more specifically sharpness, which is crucial to ensure a robust performance of models. Further, we will discuss the development of deep learning models specific to facial emotion detection, age classification, and gender classification, which essentially includes the preprocessing techniques interrelated with model performance. Conclusions from this study pinpoint the best practices in the preparation of image datasets, targeting the best trade-off between computational efficiency and retaining important image features critical for effective training of deep learning models.

Keywords: image processing, machine learning applications, template matching, emotion detection

Procedia PDF Downloads 20
8370 Methods for Distinction of Cattle Using Supervised Learning

Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl

Abstract:

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning

Procedia PDF Downloads 552
8369 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 418
8368 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower, and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software we used in our study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: preprocessing of the data used, feature detection, and classification. We tried to determine the success of our study with different accuracy metrics and the results. We presented it comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: decision tree, water quality, water pollution, machine learning

Procedia PDF Downloads 85
8367 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 135
8366 Advancing Urban Sustainability through Data-Driven Machine Learning Solutions

Authors: Nasim Eslamirad, Mahdi Rasoulinezhad, Francesco De Luca, Sadok Ben Yahia, Kimmo Sakari Lylykangas, Francesco Pilla

Abstract:

With the ongoing urbanization, cities face increasing environmental challenges impacting human well-being. To tackle these issues, data-driven approaches in urban analysis have gained prominence, leveraging urban data to promote sustainability. Integrating Machine Learning techniques enables researchers to analyze and predict complex environmental phenomena like Urban Heat Island occurrences in urban areas. This paper demonstrates the implementation of data-driven approach and interpretable Machine Learning algorithms with interpretability techniques to conduct comprehensive data analyses for sustainable urban design. The developed framework and algorithms are demonstrated for Tallinn, Estonia to develop sustainable urban strategies to mitigate urban heat waves. Geospatial data, preprocessed and labeled with UHI levels, are used to train various ML models, with Logistic Regression emerging as the best-performing model based on evaluation metrics to derive a mathematical equation representing the area with UHI or without UHI effects, providing insights into UHI occurrences based on buildings and urban features. The derived formula highlights the importance of building volume, height, area, and shape length to create an urban environment with UHI impact. The data-driven approach and derived equation inform mitigation strategies and sustainable urban development in Tallinn and offer valuable guidance for other locations with varying climates.

Keywords: data-driven approach, machine learning transparent models, interpretable machine learning models, urban heat island effect

Procedia PDF Downloads 41
8365 Accelerating Quantum Chemistry Calculations: Machine Learning for Efficient Evaluation of Electron-Repulsion Integrals

Authors: Nishant Rodrigues, Nicole Spanedda, Chilukuri K. Mohan, Arindam Chakraborty

Abstract:

A crucial objective in quantum chemistry is the computation of the energy levels of chemical systems. This task requires electron-repulsion integrals as inputs, and the steep computational cost of evaluating these integrals poses a major numerical challenge in efficient implementation of quantum chemical software. This work presents a moment-based machine-learning approach for the efficient evaluation of electron-repulsion integrals. These integrals were approximated using linear combinations of a small number of moments. Machine learning algorithms were applied to estimate the coefficients in the linear combination. A random forest approach was used to identify promising features using a recursive feature elimination approach, which performed best for learning the sign of each coefficient but not the magnitude. A neural network with two hidden layers were then used to learn the coefficient magnitudes along with an iterative feature masking approach to perform input vector compression, identifying a small subset of orbitals whose coefficients are sufficient for the quantum state energy computation. Finally, a small ensemble of neural networks (with a median rule for decision fusion) was shown to improve results when compared to a single network.

Keywords: quantum energy calculations, atomic orbitals, electron-repulsion integrals, ensemble machine learning, random forests, neural networks, feature extraction

Procedia PDF Downloads 117
8364 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia

Authors: The Danh Phan

Abstract:

House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.

Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise

Procedia PDF Downloads 233
8363 Automatic Lead Qualification with Opinion Mining in Customer Relationship Management Projects

Authors: Victor Radich, Tania Basso, Regina Moraes

Abstract:

Lead qualification is one of the main procedures in Customer Relationship Management (CRM) projects. Its main goal is to identify potential consumers who have the ideal characteristics to establish a profitable and long-term relationship with a certain organization. Social networks can be an important source of data for identifying and qualifying leads since interest in specific products or services can be identified from the users’ expressed feelings of (dis)satisfaction. In this context, this work proposes the use of machine learning techniques and sentiment analysis as an extra step in the lead qualification process in order to improve it. In addition to machine learning models, sentiment analysis or opinion mining can be used to understand the evaluation that the user makes of a particular service, product, or brand. The results obtained so far have shown that it is possible to extract data from social networks and combine the techniques for a more complete classification.

Keywords: lead qualification, sentiment analysis, opinion mining, machine learning, CRM, lead scoring

Procedia PDF Downloads 89
8362 Copyright Clearance for Artificial Intelligence Training Data: Challenges and Solutions

Authors: Erva Akin

Abstract:

– The use of copyrighted material for machine learning purposes is a challenging issue in the field of artificial intelligence (AI). While machine learning algorithms require large amounts of data to train and improve their accuracy and creativity, the use of copyrighted material without permission from the authors may infringe on their intellectual property rights. In order to overcome copyright legal hurdle against the data sharing, access and re-use of data, the use of copyrighted material for machine learning purposes may be considered permissible under certain circumstances. For example, if the copyright holder has given permission to use the data through a licensing agreement, then the use for machine learning purposes may be lawful. It is also argued that copying for non-expressive purposes that do not involve conveying expressive elements to the public, such as automated data extraction, should not be seen as infringing. The focus of such ‘copy-reliant technologies’ is on understanding language rules, styles, and syntax and no creative ideas are being used. However, the non-expressive use defense is within the framework of the fair use doctrine, which allows the use of copyrighted material for research or educational purposes. The questions arise because the fair use doctrine is not available in EU law, instead, the InfoSoc Directive provides for a rigid system of exclusive rights with a list of exceptions and limitations. One could only argue that non-expressive uses of copyrighted material for machine learning purposes do not constitute a ‘reproduction’ in the first place. Nevertheless, the use of machine learning with copyrighted material is difficult because EU copyright law applies to the mere use of the works. Two solutions can be proposed to address the problem of copyright clearance for AI training data. The first is to introduce a broad exception for text and data mining, either mandatorily or for commercial and scientific purposes, or to permit the reproduction of works for non-expressive purposes. The second is that copyright laws should permit the reproduction of works for non-expressive purposes, which opens the door to discussions regarding the transposition of the fair use principle from the US into EU law. Both solutions aim to provide more space for AI developers to operate and encourage greater freedom, which could lead to more rapid innovation in the field. The Data Governance Act presents a significant opportunity to advance these debates. Finally, issues concerning the balance of general public interests and legitimate private interests in machine learning training data must be addressed. In my opinion, it is crucial that robot-creation output should fall into the public domain. Machines depend on human creativity, innovation, and expression. To encourage technological advancement and innovation, freedom of expression and business operation must be prioritised.

Keywords: artificial intelligence, copyright, data governance, machine learning

Procedia PDF Downloads 85
8361 New Machine Learning Optimization Approach Based on Input Variables Disposition Applied for Time Series Prediction

Authors: Hervice Roméo Fogno Fotsoa, Germaine Djuidje Kenmoe, Claude Vidal Aloyem Kazé

Abstract:

One of the main applications of machine learning is the prediction of time series. But a more accurate prediction requires a more optimal model of machine learning. Several optimization techniques have been developed, but without considering the input variables disposition of the system. Thus, this work aims to present a new machine learning architecture optimization technique based on their optimal input variables disposition. The validations are done on the prediction of wind time series, using data collected in Cameroon. The number of possible dispositions with four input variables is determined, i.e., twenty-four. Each of the dispositions is used to perform the prediction, with the main criteria being the training and prediction performances. The results obtained from a static architecture and a dynamic architecture of neural networks have shown that these performances are a function of the input variable's disposition, and this is in a different way from the architectures. This analysis revealed that it is necessary to take into account the input variable's disposition for the development of a more optimal neural network model. Thus, a new neural network training algorithm is proposed by introducing the search for the optimal input variables disposition in the traditional back-propagation algorithm. The results of the application of this new optimization approach on the two single neural network architectures are compared with the previously obtained results step by step. Moreover, this proposed approach is validated in a collaborative optimization method with a single objective optimization technique, i.e., genetic algorithm back-propagation neural networks. From these comparisons, it is concluded that each proposed model outperforms its traditional model in terms of training and prediction performance of time series. Thus the proposed optimization approach can be useful in improving the accuracy of time series forecasts. This proves that the proposed optimization approach can be useful in improving the accuracy of time series prediction based on machine learning.

Keywords: input variable disposition, machine learning, optimization, performance, time series prediction

Procedia PDF Downloads 111