Search results for: supervised machine learning
2824 Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach
Authors: Lin Yang, Zhian Mi, Jiacheng Xiao, Rong Li
Abstract:
Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.
Keywords: Harmonic analysis, machine learning, music classification and tagging, MIDI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7582823 A Machine Learning Approach for Anomaly Detection in Environmental IoT-Driven Wastewater Purification Systems
Authors: Giovanni Cicceri, Roberta Maisano, Nathalie Morey, Salvatore Distefano
Abstract:
The main goal of this paper is to present a solution for a water purification system based on an Environmental Internet of Things (EIoT) platform to monitor and control water quality and machine learning (ML) models to support decision making and speed up the processes of purification of water. A real case study has been implemented by deploying an EIoT platform and a network of devices, called Gramb meters and belonging to the Gramb project, on wastewater purification systems located in Calabria, south of Italy. The data thus collected are used to control the wastewater quality, detect anomalies and predict the behaviour of the purification system. To this extent, three different statistical and machine learning models have been adopted and thus compared: Autoregressive Integrated Moving Average (ARIMA), Long Short Term Memory (LSTM) autoencoder, and Facebook Prophet (FP). The results demonstrated that the ML solution (LSTM) out-perform classical statistical approaches (ARIMA, FP), in terms of both accuracy, efficiency and effectiveness in monitoring and controlling the wastewater purification processes.Keywords: EIoT, machine learning, anomaly detection, environment monitoring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10272822 Performance Analysis of Traffic Classification with Machine Learning
Authors: Htay Htay Yi, Zin May Aye
Abstract:
Network security is role of the ICT environment because malicious users are continually growing that realm of education, business, and then related with ICT. The network security contravention is typically described and examined centrally based on a security event management system. The firewalls, Intrusion Detection System (IDS), and Intrusion Prevention System are becoming essential to monitor or prevent of potential violations, incidents attack, and imminent threats. In this system, the firewall rules are set only for where the system policies are needed. Dataset deployed in this system are derived from the testbed environment. The traffic as in DoS and PortScan traffics are applied in the testbed with firewall and IDS implementation. The network traffics are classified as normal or attacks in the existing testbed environment based on six machine learning classification methods applied in the system. It is required to be tested to get datasets and applied for DoS and PortScan. The dataset is based on CICIDS2017 and some features have been added. This system tested 26 features from the applied dataset. The system is to reduce false positive rates and to improve accuracy in the implemented testbed design. The system also proves good performance by selecting important features and comparing existing a dataset by machine learning classifiers.Keywords: False negative rate, intrusion detection system, machine learning methods, performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10712821 Training in Psychology in Brazil – Reflections on the Role of Early Supervised Internships in Undergraduate Courses
Authors: Ana Paula Melchiors Stahlschmidt, Cristina Py de Pinto Gomes Mairesse
Abstract:
This paper presents observations on the early supervised internships in Psychology, currently called basic internships in Brazil, and its importance in professional training. The work is an experience report and focuses on the Professional training, illustrated by the reality of a Brazilian institution, used as a case study. It was developed from the authors' experience as academic supervisors of this kind of practice throughout this undergraduate course, combined with aspects investigated in the post-doctoral research of one of them. Theoretical references on the subject and related national legislation are analyzed, as well as reports of students who experienced at least one semester of this type of practice, articulated to the observations of the authors. The results demonstrate the importance of the early supervised internships as a way of creating opportunities for the students of a first contact with the professional reality and the practice of psychologists in different fields of insertion, preparing them for further experiments that require more involvement in activities of training and practices in Psychology.
Keywords: Training of psychologists, Internships in Psychology, Supervised internships, Combination of theory and practice.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15122820 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14432819 Voltage Problem Location Classification Using Performance of Least Squares Support Vector Machine LS-SVM and Learning Vector Quantization LVQ
Authors: Khaled Abduesslam. M, Mohammed Ali, Basher H Alsdai, Muhammad Nizam, Inayati
Abstract:
This paper presents the voltage problem location classification using performance of Least Squares Support Vector Machine (LS-SVM) and Learning Vector Quantization (LVQ) in electrical power system for proper voltage problem location implemented by IEEE 39 bus New- England. The data was collected from the time domain simulation by using Power System Analysis Toolbox (PSAT). Outputs from simulation data such as voltage, phase angle, real power and reactive power were taken as input to estimate voltage stability at particular buses based on Power Transfer Stability Index (PTSI).The simulation data was carried out on the IEEE 39 bus test system by considering load bus increased on the system. To verify of the proposed LS-SVM its performance was compared to Learning Vector Quantization (LVQ). The results showed that LS-SVM is faster and better as compared to LVQ. The results also demonstrated that the LS-SVM was estimated by 0% misclassification whereas LVQ had 7.69% misclassification.
Keywords: IEEE 39 bus, Least Squares Support Vector Machine, Learning Vector Quantization, Voltage Collapse.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24052818 Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values
Authors: S. Kotsiantis, A. Kostoulas, S. Lykoudis, A. Argiriou, K. Menagias
Abstract:
Estimates of temperature values at a specific time of day, from daytime and daily profiles, are needed for a number of environmental, ecological, agricultural and technical applications, ranging from natural hazards assessments, crop growth forecasting to design of solar energy systems. The scope of this research is to investigate the efficiency of data mining techniques in estimating minimum, maximum and mean temperature values. For this reason, a number of experiments have been conducted with well-known regression algorithms using temperature data from the city of Patras in Greece. The performance of these algorithms has been evaluated using standard statistical indicators, such as Correlation Coefficient, Root Mean Squared Error, etc.
Keywords: regression algorithms, supervised machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34182817 Crude Oil Price Prediction Using LSTM Networks
Authors: Varun Gupta, Ankit Pandey
Abstract:
Crude oil market is an immensely complex and dynamic environment and thus the task of predicting changes in such an environment becomes challenging with regards to its accuracy. A number of approaches have been adopted to take on that challenge and machine learning has been at the core in many of them. There are plenty of examples of algorithms based on machine learning yielding satisfactory results for such type of prediction. In this paper, we have tried to predict crude oil prices using Long Short-Term Memory (LSTM) based recurrent neural networks. We have tried to experiment with different types of models using different epochs, lookbacks and other tuning methods. The results obtained are promising and presented a reasonably accurate prediction for the price of crude oil in near future.
Keywords: Crude oil price prediction, deep learning, LSTM, recurrent neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37132816 Modeling Language for Constructing Solvers in Machine Learning: Reductionist Perspectives
Authors: Tsuyoshi Okita
Abstract:
For a given specific problem an efficient algorithm has been the matter of study. However, an alternative approach orthogonal to this approach comes out, which is called a reduction. In general for a given specific problem this reduction approach studies how to convert an original problem into subproblems. This paper proposes a formal modeling language to support this reduction approach in order to make a solver quickly. We show three examples from the wide area of learning problems. The benefit is a fast prototyping of algorithms for a given new problem. It is noted that our formal modeling language is not intend for providing an efficient notation for data mining application, but for facilitating a designer who develops solvers in machine learning.
Keywords: Formal language, statistical inference problem, reduction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13282815 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.
Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4082814 Genetic Algorithms for Feature Generation in the Context of Audio Classification
Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes
Abstract:
Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.
Keywords: Feature generation, feature learning, genetic algorithm, music information retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10782813 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
Authors: Ruchika Malhotra, Megha Khanna
Abstract:
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.Keywords: Change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15202812 Determination of Water Pollution and Water Quality with Decision Trees
Authors: Çiğdem Bakır, Mecit Yüzkat
Abstract:
With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software used in the study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: Preprocessing of the data used, feature detection and classification. We tried to determine the success of our study with different accuracy metrics and the results were presented comparatively. In addition, we achieved approximately 98% success with the decision tree.
Keywords: Decision tree, water quality, water pollution, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2602811 Accelerating Quantum Chemistry Calculations: Machine Learning for Efficient Evaluation of Electron-Repulsion Integrals
Authors: Nishant Rodrigues, Nicole Spanedda, Chilukuri K. Mohan, Arindam Chakraborty
Abstract:
A crucial objective in quantum chemistry is the computation of the energy levels of chemical systems. This task requires electron-repulsion integrals as inputs and the steep computational cost of evaluating these integrals poses a major numerical challenge in efficient implementation of quantum chemical software. This work presents a moment-based machine learning approach for the efficient evaluation of electron-repulsion integrals. These integrals were approximated using linear combinations of a small number of moments. Machine learning algorithms were applied to estimate the coefficients in the linear combination. A random forest approach was used to identify promising features using a recursive feature elimination approach, which performed best for learning the sign of each coefficient, but not the magnitude. A neural network with two hidden layers was then used to learn the coefficient magnitudes, along with an iterative feature masking approach to perform input vector compression, identifying a small subset of orbitals whose coefficients are sufficient for the quantum state energy computation. Finally, a small ensemble of neural networks (with a median rule for decision fusion) was shown to improve results when compared to a single network.
Keywords: Quantum energy calculations, atomic orbitals, electron-repulsion integrals, ensemble machine learning, random forests, neural networks, feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1882810 Blind Image Deconvolution by Neural Recursive Function Approximation
Authors: Jiann-Ming Wu, Hsiao-Chang Chen, Chun-Chang Wu, Pei-Hsun Hsu
Abstract:
This work explores blind image deconvolution by recursive function approximation based on supervised learning of neural networks, under the assumption that a degraded image is linear convolution of an original source image through a linear shift-invariant (LSI) blurring matrix. Supervised learning of neural networks of radial basis functions (RBF) is employed to construct an embedded recursive function within a blurring image, try to extract non-deterministic component of an original source image, and use them to estimate hyper parameters of a linear image degradation model. Based on the estimated blurring matrix, reconstruction of an original source image from a blurred image is further resolved by an annealed Hopfield neural network. By numerical simulations, the proposed novel method is shown effective for faithful estimation of an unknown blurring matrix and restoration of an original source image.
Keywords: Blind image deconvolution, linear shift-invariant(LSI), linear image degradation model, radial basis functions (rbf), recursive function, annealed Hopfield neural networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20612809 Discussing Embedded versus Central Machine Learning in Wireless Sensor Networks
Authors: Anne-Lena Kampen, Øivind Kure
Abstract:
Machine learning (ML) can be implemented in Wireless Sensor Networks (WSNs) as a central solution or distributed solution where the ML is embedded in the nodes. Embedding improves privacy and may reduce prediction delay. In addition, the number of transmissions is reduced. However, quality factors such as prediction accuracy, fault detection efficiency and coordinated control of the overall system suffer. Here, we discuss and highlight the trade-offs that should be considered when choosing between embedding and centralized ML, especially for multihop networks. In addition, we present estimations that demonstrate the energy trade-offs between embedded and centralized ML. Although the total network energy consumption is lower with central prediction, it makes the network more prone for partitioning due to the high forwarding load on the one-hop nodes. Moreover, the continuous improvements in the number of operations per joule for embedded devices will move the energy balance toward embedded prediction.
Keywords: Central ML, embedded machine learning, energy consumption, local ML, Wireless Sensor Networks, WSN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8282808 A Cognitive Model of Character Recognition Using Support Vector Machines
Authors: K. Freedman
Abstract:
In the present study, a support vector machine (SVM) learning approach to character recognition is proposed. Simple feature detectors, similar to those found in the human visual system, were used in the SVM classifier. Alphabetic characters were rotated to 8 different angles and using the proposed cognitive model, all characters were recognized with 100% accuracy and specificity. These same results were found in psychiatric studies of human character recognition.Keywords: Character recognition, cognitive model, support vector machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18782807 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes
Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari
Abstract:
The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.
Keywords: Arabic Language acquisition and learning, natural language processing, morphological analyzer, part-of-speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10472806 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling
Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal
Abstract:
Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8372805 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles
Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi
Abstract:
Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.Keywords: Artificial neural networks, fuel consumption, machine learning, regression, statistical tests.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8312804 Risk Factors of Becoming NEET Youth in Iran: A Machine Learning Approach
Authors: Hamed Rahmani, Wim Groot
Abstract:
The term "youth not in employment, education or training (NEET)" refers to a combination of youth unemployment and school dropout. This study investigates the variables that increase the risk of becoming NEET in Iran. A selection bias-adjusted Probit model was employed using machine learning to identify these risk factors. We used cross-sectional data obtained from the Statistical Center of Iran and the Ministry of Cooperatives Labor and Social Welfare that are taken from the labor force survey conducted in the spring of 2021. We look at years of education, work experience, housework, the number of children under the age of 6 years in the home, family education, birthplace, and the amount of land owned by households. Results show that hours spent performing domestic chores enhance the likelihood of youth becoming NEET, and years of education, years of potential work experience decrease the chance of being NEET. The findings also show that female youth born in cities were less likely than those born in rural regions to become NEET.
Keywords: NEET youth, probit, CART, machine learning, unemployment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3502803 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R
Authors: Jaya Mathew
Abstract:
Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.
Keywords: Predictive maintenance, machine learning, big data, cloud, on premise SQL, R.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19202802 Bidirectional Discriminant Supervised Locality Preserving Projection for Face Recognition
Abstract:
Dimensionality reduction and feature extraction are of crucial importance for achieving high efficiency in manipulating the high dimensional data. Two-dimensional discriminant locality preserving projection (2D-DLPP) and two-dimensional discriminant supervised LPP (2D-DSLPP) are two effective two-dimensional projection methods for dimensionality reduction and feature extraction of face image matrices. Since 2D-DLPP and 2D-DSLPP preserve the local structure information of the original data and exploit the discriminant information, they usually have good recognition performance. However, 2D-DLPP and 2D-DSLPP only employ single-sided projection, and thus the generated low dimensional data matrices have still many features. In this paper, by combining the discriminant supervised LPP with the bidirectional projection, we propose the bidirectional discriminant supervised LPP (BDSLPP). The left and right projection matrices for BDSLPP can be computed iteratively. Experimental results show that the proposed BDSLPP achieves higher recognition accuracy than 2D-DLPP, 2D-DSLPP, and bidirectional discriminant LPP (BDLPP).Keywords: Face recognition, dimension reduction, locality preserving projection, discriminant information, bidirectional projection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6902801 Time Organization for Urban Mobility Decongestion: A Methodology for People’s Profile Identification
Authors: Yassamina Berkane, Leïla Kloul, Yoann Demoli
Abstract:
Quality of life, environmental impact, congestion of mobility means, and infrastructures remain significant challenges for urban mobility. Solutions like car sharing, spatial redesign, eCommerce, and autonomous vehicles will likely increase the unit veh-km and the density of cars in urban traffic, thus reducing congestion. However, the impact of such solutions is not clear for researchers. Congestion arises from growing populations that must travel greater distances to arrive at similar locations (e.g., workplaces, schools) during the same time frame (e.g., rush hours). This paper first reviews the research and application cases of urban congestion methods through recent years. Rethinking the question of time, it then investigates people’s willingness and flexibility to adapt their arrival and departure times from workplaces. We use neural networks and methods of supervised learning to apply a methodology for predicting peoples’ intentions from their responses in a questionnaire. We created and distributed a questionnaire to more than 50 companies in the Paris suburb. Obtained results illustrate that our methodology can predict peoples’ intentions to reschedule their activities (work, study, commerce, etc.).
Keywords: Urban mobility, decongestion, machine learning, neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4812800 WebAppShield: An Approach Exploiting Machine Learning to Detect SQLi Attacks in an Application Layer in Run-Time
Authors: Ahmed Abdulla Ashlam, Atta Badii, Frederic Stahl
Abstract:
In recent years, SQL injection attacks have been identified as being prevalent against web applications. They affect network security and user data, which leads to a considerable loss of money and data every year. This paper presents the use of classification algorithms in machine learning using a method to classify the login data filtering inputs into "SQLi" or "Non-SQLi,” thus increasing the reliability and accuracy of results in terms of deciding whether an operation is an attack or a valid operation. A method as a Web-App is developed for auto-generated data replication to provide a twin of the targeted data structure. Shielding against SQLi attacks (WebAppShield) that verifies all users and prevents attackers (SQLi attacks) from entering and or accessing the database, which the machine learning module predicts as "Non-SQLi", has been developed. A special login form has been developed with a special instance of the data validation; this verification process secures the web application from its early stages. The system has been tested and validated, and up to 99% of SQLi attacks have been prevented.
Keywords: SQL injection, attacks, web application, accuracy, database, WebAppShield.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4442799 Solution Approaches for Some Scheduling Problems with Learning Effect and Job Dependent Delivery Times
Authors: M. Duran Toksarı, B. Uçarkuş
Abstract:
In this paper, we propose two algorithms to optimally solve makespan and total completion time scheduling problems with learning effect and job dependent delivery times in a single machine environment. The delivery time is the extra time to eliminate adverse effect between the main processing and delivery to the customer. In this paper, we introduce the job dependent delivery times for some single machine scheduling problems with position dependent learning effect, which are makespan are total completion. The results with respect to two algorithms proposed for solving of the each problem are compared with LINGO solutions for 50-jobs, 100-jobs and 150- jobs problems. The proposed algorithms can find the same results in shorter time.Keywords: Delivery times, learning effect, makespan, scheduling, total completion time.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15512798 Resilient Machine Learning in the Nuclear Industry: Crack Detection as a Case Study
Authors: Anita Khadka, Gregory Epiphaniou, Carsten Maple
Abstract:
There is a dramatic surge in the adoption of Machine Learning (ML) techniques in many areas, including the nuclear industry (such as fault diagnosis and fuel management in nuclear power plants), autonomous systems (including self-driving vehicles), space systems (space debris recovery, for example), medical surgery, network intrusion detection, malware detection, to name a few. Artificial Intelligence (AI) has become a part of everyday modern human life. To date, the predominant focus has been developing underpinning ML algorithms that can improve accuracy, while factors such as resiliency and robustness of algorithms have been largely overlooked. If an adversarial attack is able to compromise the learning method or data, the consequences can be fatal, especially but not exclusively in safety-critical applications. In this paper, we present an in-depth analysis of five adversarial attacks and two defence methods on a crack detection ML model. Our analysis shows that it can be dangerous to adopt ML techniques without rigorous testing, since they may be vulnerable to adversarial attacks, especially in security-critical areas such as the nuclear industry. We observed that while the adopted defence methods can effectively defend against different attacks, none of them could protect against all five adversarial attacks entirely.
Keywords: Resilient Machine Learning, attacks, defences, nuclear industry, crack detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5012797 Comparison of Deep Convolutional Neural Networks Models for Plant Disease Identification
Authors: Megha Gupta, Nupur Prakash
Abstract:
Identification of plant diseases has been performed using machine learning and deep learning models on the datasets containing images of healthy and diseased plant leaves. The current study carries out an evaluation of some of the deep learning models based on convolutional neural network architectures for identification of plant diseases. For this purpose, the publicly available New Plant Diseases Dataset, an augmented version of PlantVillage dataset, available on Kaggle platform, containing 87,900 images has been used. The dataset contained images of 26 diseases of 14 different plants and images of 12 healthy plants. The CNN models selected for the study presented in this paper are AlexNet, ZFNet, VGGNet (four models), GoogLeNet, and ResNet (three models). The selected models are trained using PyTorch, an open-source machine learning library, on Google Colaboratory. A comparative study has been carried out to analyze the high degree of accuracy achieved using these models. The highest test accuracy and F1-score of 99.59% and 0.996, respectively, were achieved by using GoogLeNet with Mini-batch momentum based gradient descent learning algorithm.
Keywords: comparative analysis, convolutional neural networks, deep learning, plant disease identification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6382796 Machine Learning Methods for Network Intrusion Detection
Authors: Mouhammad Alkasassbeh, Mohammad Almseidin
Abstract:
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.
Keywords: IDS, DDoS, MLP, KDD.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7272795 Semi-Supervised Outlier Detection Using a Generative and Adversary Framework
Authors: Jindong Gu, Matthias Schubert, Volker Tresp
Abstract:
In many outlier detection tasks, only training data belonging to one class, i.e., the positive class, is available. The task is then to predict a new data point as belonging either to the positive class or to the negative class, in which case the data point is considered an outlier. For this task, we propose a novel corrupted Generative Adversarial Network (CorGAN). In the adversarial process of training CorGAN, the Generator generates outlier samples for the negative class, and the Discriminator is trained to distinguish the positive training data from the generated negative data. The proposed framework is evaluated using an image dataset and a real-world network intrusion dataset. Our outlier-detection method achieves state-of-the-art performance on both tasks.Keywords: Outlier detection, generative adversary networks, semi-supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1074