Search results for: supervised machine learning.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2916

Search results for: supervised machine learning.

2766 Extraction of Significant Phrases from Text

Authors: Yuan J. Lui

Abstract:

Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.

Keywords: classification, keyphrase extraction, machine learning, summarization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2051
2765 A Constrained Clustering Algorithm for the Classification of Industrial Ores

Authors: Luciano Nieddu, Giuseppe Manfredi

Abstract:

In this paper a Pattern Recognition algorithm based on a constrained version of the k-means clustering algorithm will be presented. The proposed algorithm is a non parametric supervised statistical pattern recognition algorithm, i.e. it works under very mild assumptions on the dataset. The performance of the algorithm will be tested, togheter with a feature extraction technique that captures the information on the closed two-dimensional contour of an image, on images of industrial mineral ores.

Keywords: K-means, Industrial ores classification, Invariant Features, Supervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381
2764 A Hybrid Machine Learning System for Stock Market Forecasting

Authors: Rohit Choudhry, Kumkum Garg

Abstract:

In this paper, we propose a hybrid machine learning system based on Genetic Algorithm (GA) and Support Vector Machines (SVM) for stock market prediction. A variety of indicators from the technical analysis field of study are used as input features. We also make use of the correlation between stock prices of different companies to forecast the price of a stock, making use of technical indicators of highly correlated stocks, not only the stock to be predicted. The genetic algorithm is used to select the set of most informative input features from among all the technical indicators. The results show that the hybrid GA-SVM system outperforms the stand alone SVM system.

Keywords: Genetic Algorithms, Support Vector Machines, Stock Market Forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9319
2763 An Intelligent Baby Care System Based on IoT and Deep Learning Techniques

Authors: Chinlun Lai, Lunjyh Jiang

Abstract:

Due to the heavy burden and pressure of caring for infants, an integrated automatic baby watching system based on IoT smart sensing and deep learning machine vision techniques is proposed in this paper. By monitoring infant body conditions such as heartbeat, breathing, body temperature, sleeping posture, as well as the surrounding conditions such as dangerous/sharp objects, light, noise, humidity and temperature, the proposed system can analyze and predict the obvious/potential dangerous conditions according to observed data and then adopt suitable actions in real time to protect the infant from harm. Thus, reducing the burden of the caregiver and improving safety efficiency of the caring work. The experimental results show that the proposed system works successfully for the infant care work and thus can be implemented in various life fields practically.

Keywords: Baby care system, internet of things, deep learning, machine vision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1904
2762 Estimation of Functional Response Model by Supervised Functional Principal Component Analysis

Authors: Hyon I. Paek, Sang Rim Kim, Hyon A. Ryu

Abstract:

In functional linear regression, one typical problem is to reduce dimension. Compared with multivariate linear regression, functional linear regression is regarded as an infinite-dimensional case, and the main task is to reduce dimensions of functional response and functional predictors. One common approach is to adapt functional principal component analysis (FPCA) on functional predictors and then use a few leading functional principal components (FPC) to predict the functional model. The leading FPCs estimated by the typical FPCA explain a major variation of the functional predictor, but these leading FPCs may not be mostly correlated with the functional response, so they may not be significant in the prediction for response. In this paper, we propose a supervised FPCA method for a functional response model with FPCs obtained by considering the correlation of the functional response. Our method would have a better prediction accuracy than the typical FPCA method.

Keywords: Supervised, functional principal component analysis, functional response, functional linear regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13
2761 Towards Developing a Self-Explanatory Scheduling System Based on a Hybrid Approach

Authors: Jian Zheng, Yoshiyasu Takahashi, Yuichi Kobayashi, Tatsuhiro Sato

Abstract:

In the study, we present a conceptual framework for developing a scheduling system that can generate self-explanatory and easy-understanding schedules. To this end, a user interface is conceived to help planners record factors that are considered crucial in scheduling, as well as internal and external sources relating to such factors. A hybrid approach combining machine learning and constraint programming is developed to generate schedules and the corresponding factors, and accordingly display them on the user interface. Effects of the proposed system on scheduling are discussed, and it is expected that scheduling efficiency and system understandability will be improved, compared with previous scheduling systems.

Keywords: Constraint programming, Factors considered in scheduling, machine learning, scheduling system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1435
2760 Data Analysis Techniques for Predictive Maintenance on Fleet of Heavy-Duty Vehicles

Authors: Antonis Sideris, Elias Chlis Kalogeropoulos, Konstantia Moirogiorgou

Abstract:

The present study proposes a methodology for the efficient daily management of fleet vehicles and construction machinery. The application covers the area of remote monitoring of heavy-duty vehicles operation parameters, where specific sensor data are stored and examined in order to provide information about the vehicle’s health. The vehicle diagnostics allow the user to inspect whether maintenance tasks need to be performed before a fault occurs. A properly designed machine learning model is proposed for the detection of two different types of faults through classification. Cross validation is used and the accuracy of the trained model is checked with the confusion matrix.

Keywords: Fault detection, feature selection, machine learning, predictive maintenance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 781
2759 Gas Detection via Machine Learning

Authors: Walaa Khalaf, Calogero Pace, Manlio Gaudioso

Abstract:

We present an Electronic Nose (ENose), which is aimed at identifying the presence of one out of two gases, possibly detecting the presence of a mixture of the two. Estimation of the concentrations of the components is also performed for a volatile organic compound (VOC) constituted by methanol and acetone, for the ranges 40-400 and 22-220 ppm (parts-per-million), respectively. Our system contains 8 sensors, 5 of them being gas sensors (of the class TGS from FIGARO USA, INC., whose sensing element is a tin dioxide (SnO2) semiconductor), the remaining being a temperature sensor (LM35 from National Semiconductor Corporation), a humidity sensor (HIH–3610 from Honeywell), and a pressure sensor (XFAM from Fujikura Ltd.). Our integrated hardware–software system uses some machine learning principles and least square regression principle to identify at first a new gas sample, or a mixture, and then to estimate the concentrations. In particular we adopt a training model using the Support Vector Machine (SVM) approach with linear kernel to teach the system how discriminate among different gases. Then we apply another training model using the least square regression, to predict the concentrations. The experimental results demonstrate that the proposed multiclassification and regression scheme is effective in the identification of the tested VOCs of methanol and acetone with 96.61% correctness. The concentration prediction is obtained with 0.979 and 0.964 correlation coefficient for the predicted versus real concentrations of methanol and acetone, respectively.

Keywords: Electronic nose, Least square regression, Mixture ofgases, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2539
2758 Reducing the Imbalance Penalty through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: H. Anıl, G. Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations, since the geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning and time series methods, the total generation of the power plants belonging to Zorlu Doğal Electricity Generation, which has a high installed capacity in terms of geothermal, was predicted for the first one-week and first two-weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: Machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 204
2757 Multi-Factor Optimization Method through Machine Learning in Building Envelope Design: Focusing on Perforated Metal Façade

Authors: Jinwooung Kim, Jae-Hwan Jung, Seong-Jun Kim, Sung-Ah Kim

Abstract:

Because the building envelope has a significant impact on the operation and maintenance stage of the building, designing the facade considering the performance can improve the performance of the building and lower the maintenance cost of the building. In general, however, optimizing two or more performance factors confronts the limits of time and computational tools. The optimization phase typically repeats infinitely until a series of processes that generate alternatives and analyze the generated alternatives achieve the desired performance. In particular, as complex geometry or precision increases, computational resources and time are prohibitive to find the required performance, so an optimization methodology is needed to deal with this. Instead of directly analyzing all the alternatives in the optimization process, applying experimental techniques (heuristic method) learned through experimentation and experience can reduce resource waste. This study proposes and verifies a method to optimize the double envelope of a building composed of a perforated panel using machine learning to the design geometry and quantitative performance. The proposed method is to achieve the required performance with fewer resources by supplementing the existing method which cannot calculate the complex shape of the perforated panel.

Keywords: Building envelope, machine learning, perforated metal, multi-factor optimization, façade.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1226
2756 An Adaptive Hand-Talking System for the Hearing Impaired

Authors: Zhou Yu, Jiang Feng

Abstract:

An adaptive Chinese hand-talking system is presented in this paper. By analyzing the 3 data collecting strategies for new users, the adaptation framework including supervised and unsupervised adaptation methods is proposed. For supervised adaptation, affinity propagation (AP) is used to extract exemplar subsets, and enhanced maximum a posteriori / vector field smoothing (eMAP/VFS) is proposed to pool the adaptation data among different models. For unsupervised adaptation, polynomial segment models (PSMs) are used to help hidden Markov models (HMMs) to accurately label the unlabeled data, then the "labeled" data together with signerindependent models are inputted to MAP algorithm to generate signer-adapted models. Experimental results show that the proposed framework can execute both supervised adaptation with small amount of labeled data and unsupervised adaptation with large amount of unlabeled data to tailor the original models, and both achieve improvements on the performance of recognition rate.

Keywords: sign language recognition, signer adaptation, eMAP/VFS, polynomial segment model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1759
2755 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs based on Machine Learning Algorithms

Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios

Abstract:

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity and aflatoxinogenic capacity of the strains, topography, soil and climate parameters of the fig orchards are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high-performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques i.e., dimensionality reduction on the original dataset (Principal Component Analysis), metric learning (Mahalanobis Metric for Clustering) and K-nearest Neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson Correlation Coefficient (PCC) between observed and predicted values.

Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 649
2754 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite

Authors: F. Lazzeri, I. Reiter

Abstract:

Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.

Keywords: Time-series, features engineering methods for forecasting, energy demand forecasting, Azure machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290
2753 An Experimental Comparison of Unsupervised Learning Techniques for Face Recognition

Authors: Dinesh Kumar, C.S. Rai, Shakti Kumar

Abstract:

Face Recognition has always been a fascinating research area. It has drawn the attention of many researchers because of its various potential applications such as security systems, entertainment, criminal identification etc. Many supervised and unsupervised learning techniques have been reported so far. Principal Component Analysis (PCA), Self Organizing Maps (SOM) and Independent Component Analysis (ICA) are the three techniques among many others as proposed by different researchers for Face Recognition, known as the unsupervised techniques. This paper proposes integration of the two techniques, SOM and PCA, for dimensionality reduction and feature selection. Simulation results show that, though, the individual techniques SOM and PCA itself give excellent performance but the combination of these two can also be utilized for face recognition. Experimental results also indicate that for the given face database and the classifier used, SOM performs better as compared to other unsupervised learning techniques. A comparison of two proposed methodologies of SOM, Local and Global processing, shows the superiority of the later but at the cost of more computational time.

Keywords: Face Recognition, Principal Component Analysis, Self Organizing Maps, Independent Component Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1880
2752 Validating Condition-Based Maintenance Algorithms Through Simulation

Authors: Marcel Chevalier, Léo Dupont, Sylvain Marié, Frédérique Roffet, Elena Stolyarova, William Templier, Costin Vasile

Abstract:

Industrial end users are currently facing an increasing need to reduce the risk of unexpected failures and optimize their maintenance. This calls for both short-term analysis and long-term ageing anticipation. At Schneider Electric, we tackle those two issues using both Machine Learning and First Principles models. Machine learning models are incrementally trained from normal data to predict expected values and detect statistically significant short-term deviations. Ageing models are constructed from breaking down physical systems into sub-assemblies, then determining relevant degradation modes and associating each one to the right kinetic law. Validating such anomaly detection and maintenance models is challenging, both because actual incident and ageing data are rare and distorted by human interventions, and incremental learning depends on human feedback. To overcome these difficulties, we propose to simulate physics, systems and humans – including asset maintenance operations – in order to validate the overall approaches in accelerated time and possibly choose between algorithmic alternatives.

Keywords: Degradation models, ageing, anomaly detection, soft sensor, incremental learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 330
2751 Correlation Analysis to Quantify Learning Outcomes for Different Teaching Pedagogies

Authors: Kanika Sood, Sijie Shang

Abstract:

A fundamental goal of education includes preparing students to become a part of the global workforce by making beneficial contributions to society. In this paper, we analyze student performance for multiple courses that involve different teaching pedagogies: a cooperative learning technique and an inquiry-based learning strategy. Student performance includes student engagement, grades, and attendance records. We perform this study in the Computer Science department for online and in-person courses for 450 students. We will perform correlation analysis to study the relationship between student scores and other parameters such as gender, mode of learning. We use natural language processing and machine learning to analyze student feedback data and performance data. We assess the learning outcomes of two teaching pedagogies for undergraduate and graduate courses to showcase the impact of pedagogical adoption and learning outcome as determinants of academic achievement. Early findings suggest that when using the specified pedagogies, students become experts on their topics and illustrate enhanced engagement with peers.

Keywords: Bag-of-words, cooperative learning, education, inquiry-based learning, in-person learning, Natural Language Processing, online learning, sentiment analysis, teaching pedagogy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 86
2750 On the Learning of Causal Relationships between Banks in Saudi Equities Market Using Ensemble Feature Selection Methods

Authors: Adel Aloraini

Abstract:

Financial forecasting using machine learning techniques has received great efforts in the last decide . In this ongoing work, we show how machine learning of graphical models will be able to infer a visualized causal interactions between different banks in the Saudi equities market. One important discovery from such learned causal graphs is how companies influence each other and to what extend. In this work, a set of graphical models named Gaussian graphical models with developed ensemble penalized feature selection methods that combine ; filtering method, wrapper method and a regularizer will be shown. A comparison between these different developed ensemble combinations will also be shown. The best ensemble method will be used to infer the causal relationships between banks in Saudi equities market.

Keywords: Causal interactions , banks, feature selection, regularizere,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1748
2749 Cognition of Driving Context for Driving Assistance

Authors: Manolo Dulva Hina, Clement Thierry, Assia Soukane, Amar Ramdane-Cherif

Abstract:

In this paper, we presented our innovative way of determining the driving context for a driving assistance system. We invoke the fusion of all parameters that describe the context of the environment, the vehicle and the driver to obtain the driving context. We created a training set that stores driving situation patterns and from which the system consults to determine the driving situation. A machine-learning algorithm predicts the driving situation. The driving situation is an input to the fission process that yields the action that must be implemented when the driver needs to be informed or assisted from the given the driving situation. The action may be directed towards the driver, the vehicle or both. This is an ongoing work whose goal is to offer an alternative driving assistance system for safe driving, green driving and comfortable driving. Here, ontologies are used for knowledge representation.

Keywords: Cognitive driving, intelligent transportation system, multimodal system, ontology, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459
2748 Normal and Peaberry Coffee Beans Classification from Green Coffee Bean Images Using Convolutional Neural Networks and Support Vector Machine

Authors: Hira Lal Gope, Hidekazu Fukai

Abstract:

The aim of this study is to develop a system which can identify and sort peaberries automatically at low cost for coffee producers in developing countries. In this paper, the focus is on the classification of peaberries and normal coffee beans using image processing and machine learning techniques. The peaberry is not bad and not a normal bean. The peaberry is born in an only single seed, relatively round seed from a coffee cherry instead of the usual flat-sided pair of beans. It has another value and flavor. To make the taste of the coffee better, it is necessary to separate the peaberry and normal bean before green coffee beans roasting. Otherwise, the taste of total beans will be mixed, and it will be bad. In roaster procedure time, all the beans shape, size, and weight must be unique; otherwise, the larger bean will take more time for roasting inside. The peaberry has a different size and different shape even though they have the same weight as normal beans. The peaberry roasts slower than other normal beans. Therefore, neither technique provides a good option to select the peaberries. Defect beans, e.g., sour, broken, black, and fade bean, are easy to check and pick up manually by hand. On the other hand, the peaberry pick up is very difficult even for trained specialists because the shape and color of the peaberry are similar to normal beans. In this study, we use image processing and machine learning techniques to discriminate the normal and peaberry bean as a part of the sorting system. As the first step, we applied Deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) as machine learning techniques to discriminate the peaberry and normal bean. As a result, better performance was obtained with CNN than with SVM for the discrimination of the peaberry. The trained artificial neural network with high performance CPU and GPU in this work will be simply installed into the inexpensive and low in calculation Raspberry Pi system. We assume that this system will be used in under developed countries. The study evaluates and compares the feasibility of the methods in terms of accuracy of classification and processing speed.

Keywords: Convolutional neural networks, coffee bean, peaberry, sorting, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1554
2747 Using Interval Trees for Approximate Indexing of Instances

Authors: Khalil el Hindi

Abstract:

This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.

Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511
2746 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: Feature selection methods, Machine learning, NB, One-class SVM, Sentiment Analysis, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3303
2745 Fine-Grained Sentiment Analysis: Recent Progress

Authors: Jie Liu, Xudong Luo, Pingping Lin, Yifan Fan

Abstract:

Facebook, Twitter, Weibo, and other social media and significant e-commerce sites generate a massive amount of online texts, which can be used to analyse people’s opinions or sentiments for better decision-making. So, sentiment analysis, especially the fine-grained sentiment analysis, is a very active research topic. In this paper, we survey various methods for fine-grained sentiment analysis, including traditional sentiment lexicon-based methods, ma-chine learning-based methods, and deep learning-based methods in aspect/target/attribute-based sentiment analysis tasks. Besides, we discuss their advantages and problems worthy of careful studies in the future.

Keywords: sentiment analysis, fine-grained, machine learning, deep learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2403
2744 Machine Morphisms and Simulation

Authors: Janis Buls

Abstract:

This paper examines the concept of simulation from a modelling viewpoint. How can one Mealy machine simulate the other one? We create formalism for simulation of Mealy machines. The injective s–morphism of the machine semigroups induces the simulation of machines [1]. We present the example of s–morphism such that it is not a homomorphism of semigroups. The story for the surjective s–morphisms is quite different. These are homomorphisms of semigroups but there exists the surjective s–morphism such that it does not induce the simulation.

Keywords: Mealy machine, simulation, machine semigroup, injective s–morphism, surjective s–morphisms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511
2743 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1138
2742 Explanatory of Relationship between Learning Motivation and Learning Performance

Authors: Chih Chin Yang

Abstract:

In this paper, the relationship between learning motivation and learning performance is explored by using exchange theory. The relationship is concluded that external performance can raise learning motivation and then increase learning performance. The internal performance should be not completely neglected and the external performance should be not attached important excessively. The parents need self-study and must be also reeducated. The existing education must be improved in raise of internal performance. The incorrect learning thinking will mislead the students, parents, and educators of next generation, when the students obtain good learning performance in the learning environment with excess stimulants. Over operation of external performance will result abnormal learning thinking and violating learning goal. Learning is not only to obtain performance. Learning quality and learning performance will be limited as without learning motivation. The best learning motivation is, the best learning performance is. The learning for reward is not good for learning performance. Strategies of promoting life-long learning are including the encouraging for learner, establishment of good interaction learning environment, and the advertisement of the merit and the importance of life-long learning, which can let the learner with the correct learning motivation.

Keywords: exchange theory, learning motivation, learning performance, learning quality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621
2741 Optimizing Machine Vision System Setup Accuracy by Six-Sigma DMAIC Approach

Authors: Joseph C. Chen

Abstract:

Machine vision system provides automatic inspection to reduce manufacturing costs considerably. However, only a few principles have been found to optimize machine vision system and help it function more accurately in industrial practice. Mostly, there were complicated and impractical design techniques to improve the accuracy of machine vision system. This paper discusses implementing the Six Sigma Define, Measure, Analyze, Improve, and Control (DMAIC) approach to optimize the setup parameters of machine vision system when it is used as a direct measurement technique. This research follows a case study showing how Six Sigma DMAIC methodology has been put into use.

Keywords: DMAIC, machine vision system, process capability, Taguchi parameter design.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1254
2740 Machine Learning Based Approach for Measuring Promotion Effectiveness in Multiple Parallel Promotions’ Scenarios

Authors: Revoti Prasad Bora, Nikita Katyal

Abstract:

Promotion is a key element in the retail business. Thus, analysis of promotions to quantify their effectiveness in terms of Revenue and/or Margin is an essential activity in the retail industry. However, measuring the sales/revenue uplift is based on estimations, as the actual sales/revenue without the promotion is not present. Further, the presence of Halo and Cannibalization in a multiple parallel promotions’ scenario complicates the problem. Calculating Baseline by considering inter-brand/competitor items or using Halo and Cannibalization's impact on Revenue calculations by considering Baseline as an interpretation of items’ unit sales in neighboring nonpromotional weeks individually may not capture the overall Revenue uplift in the case of multiple parallel promotions. Hence, this paper proposes a Machine Learning based method for calculating the Revenue uplift by considering the Halo and Cannibalization impact on the Baseline and the Revenue. In the first section of the proposed methodology, Baseline of an item is calculated by incorporating the impact of the promotions on its related items. In the later section, the Revenue of an item is calculated by considering both Halo and Cannibalization impacts. Hence, this methodology enables correct calculation of the overall Revenue uplift due a given promotion.

Keywords: Halo, cannibalization, promotion, baseline, temporary price reduction, retail, elasticity, cross price elasticity, machine learning, random forest, linear regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1326
2739 Assessment on Communication Students’ Internship Performances from the Employers’ Perspective

Authors: Yesuselvi Manickam, Tan Soon Chin

Abstract:

Internship is a supervised and structured learning experience related to one’s field of study or career goal. Internship allows students to obtain work experience and the opportunity to apply skills learned during university. Internship is a valuable learning experience for students; however, literature on employer assessment is scarce on Malaysian student’s internship experience. This study focuses on employer’s perspective on student’s performances during their three months of internship. The results are based on the descriptive analysis of 45 sets of question gathered from the on-site supervisors of the interns. The survey of 45 on-site supervisor’s feedback was collected through postal mail. It was found that, interns have not met their on-site supervisor’s expectations in many areas. The significance of this study is employer’s assessment on the internship shall be used as feedback to improve on ways how to prepare students for their internship and employments in future.

Keywords: Employers perspective, internship, structured learning, student’s performances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2272
2738 DIFFER: A Propositionalization approach for Learning from Structured Data

Authors: Thashmee Karunaratne, Henrik Böstrom

Abstract:

Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.

Keywords: Machine learning, Structure classification, Propositionalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1223
2737 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1667