Search results for: classification problem
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4465

Search results for: classification problem

3985 Categorical Missing Data Imputation Using Fuzzy Neural Networks with Numerical and Categorical Inputs

Authors: Pilar Rey-del-Castillo, Jesús Cardeñosa

Abstract:

There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson-s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.

Keywords: Classifier, imputation techniques, fuzzy systems, fuzzy min-max neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
3984 A Study on Linking Upward Substitution and Fuzzy Demands in the Newsboy-Type Problem

Authors: Pankaj Dutta, Debjani Chakraborty

Abstract:

This paper investigates the effect of product substitution in the single-period 'newsboy-type' problem in a fuzzy environment. It is supposed that the single-period problem operates under uncertainty in customer demand, which is described by imprecise terms and modelled by fuzzy sets. To perform this analysis, we consider the fuzzy model for two-item with upward substitution. This upward substitutability is reasonable when the products can be stored according to certain attribute levels such as quality, brand or package size. We show that the explicit consideration of this substitution opportunity increase the average expected profit. Computational study is performed to observe the benefits of product's substitution.

Keywords: Fuzzy demand, Newsboy, Single-period problem, Substitution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1393
3983 File Format of Flow Chart Simulation Software - CFlow

Authors: Syahanim Mohd Salleh, Zaihosnita Hood, Hairulliza Mohd Judi, Marini Abu Bakar

Abstract:

CFlow is a flow chart software, it contains facilities to draw and evaluate a flow chart. A flow chart evaluation applies a simulation method to enable presentation of work flow in a flow chart solution. Flow chart simulation of CFlow is executed by manipulating the CFlow data file which is saved in a graphical vector format. These text-based data are organised by using a data classification technic based on a Library classification-scheme. This paper describes the file format for flow chart simulation software of CFlow.

Keywords: CFlow, flow chart, file format.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2518
3982 A Numerical Solution Based On Operational Matrix of Differentiation of Shifted Second Kind Chebyshev Wavelets for a Stefan Problem

Authors: Rajeev, N. K. Raigar

Abstract:

In this study, one dimensional phase change problem (a Stefan problem) is considered and a numerical solution of this problem is discussed. First, we use similarity transformation to convert the governing equations into ordinary differential equations with its boundary conditions. The solutions of ordinary differential equation with the associated boundary conditions and interface condition (Stefan condition) are obtained by using a numerical approach based on operational matrix of differentiation of shifted second kind Chebyshev wavelets. The obtained results are compared with existing exact solution which is sufficiently accurate.

Keywords: Operational matrix of differentiation, Similarity transformation, Shifted second kind Chebyshev wavelets, Stefan problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1972
3981 A Flexible Flowshop Scheduling Problem with Machine Eligibility Constraint and Two Criteria Objective Function

Authors: Bita Tadayon, Nasser Salmasi

Abstract:

This research deals with a flexible flowshop scheduling problem with arrival and delivery of jobs in groups and processing them individually. Due to the special characteristics of each job, only a subset of machines in each stage is eligible to process that job. The objective function deals with minimization of sum of the completion time of groups on one hand and minimization of sum of the differences between completion time of jobs and delivery time of the group containing that job (waiting period) on the other hand. The problem can be stated as FFc / rj , Mj / irreg which has many applications in production and service industries. A mathematical model is proposed, the problem is proved to be NPcomplete, and an effective heuristic method is presented to schedule the jobs efficiently. This algorithm can then be used within the body of any metaheuristic algorithm for solving the problem.

Keywords: flexible flowshop scheduling, group processing, machine eligibility constraint, mathematical modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805
3980 Approximating Maximum Weighted Independent Set Using Vertex Support

Authors: S. Balaji, V. Swaminathan, K. Kannan

Abstract:

The Maximum Weighted Independent Set (MWIS) problem is a classic graph optimization NP-hard problem. Given an undirected graph G = (V, E) and weighting function defined on the vertex set, the MWIS problem is to find a vertex set S V whose total weight is maximum subject to no two vertices in S are adjacent. This paper presents a novel approach to approximate the MWIS of a graph using minimum weighted vertex cover of the graph. Computational experiments are designed and conducted to study the performance of our proposed algorithm. Extensive simulation results show that the proposed algorithm can yield better solutions than other existing algorithms found in the literature for solving the MWIS.

Keywords: weighted independent set, vertex cover, vertex support, heuristic, NP - hard problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2007
3979 Modeling and Simulation of Flow Shop Scheduling Problem through Petri Net Tools

Authors: Joselito Medina Marin, Norberto Hernández Romero, Juan Carlos Seck Tuoh Mora, Erick S. Martinez Gomez

Abstract:

The Flow Shop Scheduling Problem (FSSP) is a typical problem that is faced by production planning managers in Flexible Manufacturing Systems (FMS). This problem consists in finding the optimal scheduling to carry out a set of jobs, which are processed in a set of machines or shared resources. Moreover, all the jobs are processed in the same machine sequence. As in all the scheduling problems, the makespan can be obtained by drawing the Gantt chart according to the operations order, among other alternatives. On this way, an FMS presenting the FSSP can be modeled by Petri nets (PNs), which are a powerful tool that has been used to model and analyze discrete event systems. Then, the makespan can be obtained by simulating the PN through the token game animation and incidence matrix. In this work, we present an adaptive PN to obtain the makespan of FSSP by applying PN analytical tools.

Keywords: Flow-shop scheduling problem, makespan, Petri nets, state equation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709
3978 On Problem of Parameters Identification of Dynamic Object

Authors: Kamil Aida-zade, C. Ardil

Abstract:

In this paper, some problem formulations of dynamic object parameters recovery described by non-autonomous system of ordinary differential equations with multipoint unshared edge conditions are investigated. Depending on the number of additional conditions the problem is reduced to an algebraic equations system or to a problem of quadratic programming. With this purpose the paper offers a new scheme of the edge conditions transfer method called by conditions shift. The method permits to get rid from differential links and multipoint unshared initially-edge conditions. The advantage of the proposed approach is concluded by capabilities of reduction of a parametric identification problem to essential simple problems of the solution of an algebraic system or quadratic programming.

Keywords: dynamic objects, ordinary differential equations, multipoint unshared edge conditions, quadratic programming, conditions shift

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440
3977 Deficiencies of Lung Segmentation Techniques using CT Scan Images for CAD

Authors: Nisar Ahmed Memon, Anwar Majid Mirza, S.A.M. Gilani

Abstract:

Segmentation is an important step in medical image analysis and classification for radiological evaluation or computer aided diagnosis. This paper presents the problem of inaccurate lung segmentation as observed in algorithms presented by researchers working in the area of medical image analysis. The different lung segmentation techniques have been tested using the dataset of 19 patients consisting of a total of 917 images. We obtained datasets of 11 patients from Ackron University, USA and of 8 patients from AGA Khan Medical University, Pakistan. After testing the algorithms against datasets, the deficiencies of each algorithm have been highlighted.

Keywords: Computer Aided Diagnosis (CAD), MathematicalMorphology, Medical Image Analysis, Region Growing, Segmentation, Thresholding,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2314
3976 Two Class Motor Imagery Classification via Wave Atom Sub-Bants

Authors: Nebi Gedik

Abstract:

The goal of motor image brain computer interface research is to create a link between the central nervous system and a computer or device. The most important signal for brain-computer interface is the electroencephalogram. The aim of this research is to explore a set of effective features from EEG signals, separated into frequency bands, using wave atom sub-bands to discriminate right and left-hand motor imagery signals. Over the transform coefficients, feature vectors are constructed for each frequency range and each transform sub-band, and their classification performances are tested. The method is validated using EEG signals from the BCI competition III dataset IIIa and classifiers such as support vector machine and k-nearest neighbors.

Keywords: motor imagery, EEG, Wave atom transform sub-bands, SVM, k-NN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 554
3975 Amelioration of Cardiac Arrythmias Classification Performance Using Artificial Neural Network, Adaptive Neuro-Fuzzy and Fuzzy Inference Systems Classifiers

Authors: Alexandre Boum, Salomon Madinatou

Abstract:

This paper aims at bringing a scientific contribution to the cardiac arrhythmia biomedical diagnosis systems; more precisely to the study of the amelioration of cardiac arrhythmia classification performance using artificial neural network, adaptive neuro-fuzzy and fuzzy inference systems classifiers. The purpose of this amelioration is to enable cardiologists to make reliable diagnosis through automatic cardiac arrhythmia analyzes and classifications based on high confidence classifiers. In this study, six classes of the most commonly encountered arrhythmias are considered: the Right Bundle Branch Block, the Left Bundle Branch Block, the Ventricular Extrasystole, the Auricular Extrasystole, the Atrial Fibrillation and the Normal Cardiac rate beat. From the electrocardiogram (ECG) extracted parameters, we constructed a matrix (360x360) serving as an input data sample for the classifiers based on neural networks and a matrix (1x6) for the classifier based on fuzzy logic. By varying three parameters (the quality of the neural network learning, the data size and the quality of the input parameters) the automatic classification permitted us to obtain the following performances: in terms of correct classification rate, 83.6% was obtained using the fuzzy logic based classifier, 99.7% using the neural network based classifier and 99.8% for the adaptive neuro-fuzzy based classifier. These results are based on signals containing at least 360 cardiac cycles. Based on the comparative analysis of the aforementioned three arrhythmia classifiers, the classifiers based on neural networks exhibit a better performance.

Keywords: Adaptive neuro-fuzzy, artificial neural network, cardiac arrythmias, fuzzy inference systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 675
3974 Supplier Selection by Considering Cost and Reliability

Authors: K. -H. Yang

Abstract:

Supplier selection problem is one of the important issues of supply chain problems. Two categories of methodologies include qualitative and quantitative approaches which can be applied to supplier selection problems. However, due to the complexities of the problem and lacking of reliable and quantitative data, qualitative approaches are more than quantitative approaches. This study considers operational cost and supplier’s reliability factor and solves the problem by using a quantitative approach. A mixed integer programming model is the primary analytic tool. Analyses of different scenarios with variable cost and reliability structures show that the effectiveness of this approach to the supplier selection problem.

Keywords: Mixed integer programming, quantitative approach, supplier’s reliability, supplier selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2520
3973 A Deterministic Dynamic Programming Approach for Optimization Problem with Quadratic Objective Function and Linear Constraints

Authors: S. Kavitha, Nirmala P. Ratchagar

Abstract:

This paper presents the novel deterministic dynamic programming approach for solving optimization problem with quadratic objective function with linear equality and inequality constraints. The proposed method employs backward recursion in which computations proceeds from last stage to first stage in a multi-stage decision problem. A generalized recursive equation which gives the exact solution of an optimization problem is derived in this paper. The method is purely analytical and avoids the usage of initial solution. The feasibility of the proposed method is demonstrated with a practical example. The numerical results show that the proposed method provides global optimum solution with negligible computation time.

Keywords: Backward recursion, Dynamic programming, Multi-stage decision problem, Quadratic objective function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3556
3972 Six-Phase Tooth-Coil Winding Starter-Generator Embedded in Aerospace Engine

Authors: Flur R. Ismagilov, Vyacheslav E. Vavilov, Denis V. Gusakov

Abstract:

This paper is devoted to solve the problem of increasing the electrification of aircraft engines by installing a synchronous generator at high pressure shaft. Technical solution of this problem by various research centers is discussed. A design solution of the problem was proposed. To evaluate the effectiveness of the proposed cooling system, thermal analysis was carried out in ANSYS software.

Keywords: Flur R. Ismagilov, Vyacheslav E. Vavilov, Denis V. Gusakov

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241
3971 Prediction of Writer Using Tamil Handwritten Document Image Based on Pooled Features

Authors: T. Thendral, M. S. Vijaya, S. Karpagavalli

Abstract:

Tamil handwritten document is taken as a key source of data to identify the writer. Tamil is a classical language which has 247 characters include compound characters, consonants, vowels and special character. Most characters of Tamil are multifaceted in nature. Handwriting is a unique feature of an individual. Writer may change their handwritings according to their frame of mind and this place a risky challenge in identifying the writer. A new discriminative model with pooled features of handwriting is proposed and implemented using support vector machine. It has been reported on 100% of prediction accuracy by RBF and polynomial kernel based classification model.

Keywords: Classification, Feature extraction, Support vector machine, Training, Writer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2289
3970 Prediction of Writer Using Tamil Handwritten Document Image Based on Pooled Features

Authors: T. Thendral, M. S. Vijaya, S. Karpagavalli

Abstract:

Tamil handwritten document is taken as a key source of data to identify the writer. Tamil is a classical language which has 247 characters include compound characters, consonants, vowels and special character. Most characters of Tamil are multifaceted in nature. Handwriting is a unique feature of an individual. Writer may change their handwritings according to their frame of mind and this place a risky challenge in identifying the writer. A new discriminative model with pooled features of handwriting is proposed and implemented using support vector machine. It has been reported on 100% of prediction accuracy by RBF and polynomial kernel based classification model.

Keywords: Classification, Feature extraction, Support vector machine, Training, Writer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677
3969 Determining Senses for Word Sense Disambiguation in Turkish

Authors: Zeynep Orhan, Zeynep Altan

Abstract:

Word sense disambiguation is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that specific word. This paper deals with the methodologies of determining the senses for a given word if they can not be obtained from an already available resource like WordNet. We offer a method that helps us to determine the sense boundaries gradually. In this method, first we decide on some features that are thought to be effective on the senses and divide the instances first into two, then according to the results of evaluations we continue dividing instances gradually. In a second method we use the pseudo words. We devise artificial words depending on some criteria and evaluate classification algorithms on these previously classified words.

Keywords: Word sense disambiguation, sense determination, pseudo words, sense granularity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381
3968 Calcification Classification in Mammograms Using Decision Trees

Authors: S. Usha, S. Arumugam

Abstract:

Cancer affects people globally with breast cancer being a leading killer. Breast cancer is due to the uncontrollable multiplication of cells resulting in a tumour or neoplasm. Tumours are called ‘benign’ when cancerous cells do not ravage other body tissues and ‘malignant’ if they do so. As mammography is an effective breast cancer detection tool at an early stage which is the most treatable stage it is the primary imaging modality for screening and diagnosis of this cancer type. This paper presents an automatic mammogram classification technique using wavelet and Gabor filter. Correlation feature selection is used to reduce the feature set and selected features are classified using different decision trees.

Keywords: Breast Cancer, Mammogram, Symlet Wavelets, Gabor Filters, Decision Trees

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721
3967 Dynamic Construction Site Layout Using Ant Colony Optimization

Authors: Y. Abdelrazig

Abstract:

Evolutionary optimization methods such as genetic algorithms have been used extensively for the construction site layout problem. More recently, ant colony optimization algorithms, which are evolutionary methods based on the foraging behavior of ants, have been successfully applied to benchmark combinatorial optimization problems. This paper proposes a formulation of the site layout problem in terms of a sequencing problem that is suitable for solution using an ant colony optimization algorithm. In the construction industry, site layout is a very important planning problem. The objective of site layout is to position temporary facilities both geographically and at the correct time such that the construction work can be performed satisfactorily with minimal costs and improved safety and working environment. During the last decade, evolutionary methods such as genetic algorithms have been used extensively for the construction site layout problem. This paper proposes an ant colony optimization model for construction site layout. A simple case study for a highway project is utilized to illustrate the application of the model.

Keywords: Construction site layout, optimization, ant colony.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3079
3966 EHW from Consumer Point of View: Consumer-Triggered Evolution

Authors: Yerbol Sapargaliyev, Tatiana Kalganova

Abstract:

Evolvable Hardware (EHW) has been regarded as adaptive system acquired by wide application market. Consumer market of any good requires diversity to satisfy consumers- preferences. Adaptation of EHW is a key technology that could provide individual approach to every particular user. This situation raises a question: how to set target for evolutionary algorithm? The existing techniques do not allow consumer to influence evolutionary process. Only designer at the moment is capable to influence the evolution. The proposed consumer-triggered evolution overcomes this problem by introducing new features to EHW that help adaptive system to obtain targets during consumer stage. Classification of EHW is given according to responsiveness, imitation of human behavior and target circuit response. Home intelligent water heating system is considered as an example.

Keywords: Actuators, consumer-triggered evolution, evolvable hardware, sensors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1458
3965 A Branch and Bound Algorithm for Resource Constrained Project Scheduling Problem Subject to Cumulative Resources

Authors: A. Shirzadeh Chaleshtari, Sh. Shadrokh

Abstract:

Renewable and non-renewable resource constraints have been vast studied in theoretical fields of project scheduling problems. However, although cumulative resources are widespread in practical cases, the literature on project scheduling problems subject to these resources is scant. So in order to study this type of resources more, in this paper we use the framework of a resource constrained project scheduling problem (RCPSP) with finish-start precedence relations between activities and subject to the cumulative resources in addition to the renewable resources. We develop a branch and bound algorithm for this problem customizing precedence tree algorithm of RCPSP. We perform extensive experimental analysis on the algorithm to check its effectiveness and performance for solving different instances of the problem in question.

Keywords: Resource constrained project scheduling problem, cumulative resources, branch and bound algorithm, precedence tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2879
3964 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4186
3963 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: Change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1493
3962 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
3961 An Effective Hybrid Genetic Algorithm for Job Shop Scheduling Problem

Authors: Bin Cai, Shilong Wang, Haibo Hu

Abstract:

The job shop scheduling problem (JSSP) is well known as one of the most difficult combinatorial optimization problems. This paper presents a hybrid genetic algorithm for the JSSP with the objective of minimizing makespan. The efficiency of the genetic algorithm is enhanced by integrating it with a local search method. The chromosome representation of the problem is based on operations. Schedules are constructed using a procedure that generates full active schedules. In each generation, a local search heuristic based on Nowicki and Smutnicki-s neighborhood is applied to improve the solutions. The approach is tested on a set of standard instances taken from the literature and compared with other approaches. The computation results validate the effectiveness of the proposed algorithm.

Keywords: Genetic algorithm, Job shop scheduling problem, Local search, Meta-heuristic algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628
3960 Predictive Analytics of Student Performance Determinants in Education

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi

Abstract:

Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 504
3959 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
3958 A Hybridization of Constructive Beam Search with Local Search for Far From Most Strings Problem

Authors: Sayyed R Mousavi

Abstract:

The Far From Most Strings Problem (FFMSP) is to obtain a string which is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are said to be far if their hamming distance is greater than or equal to a given positive integer. FFMSP belongs to the class of sequences consensus problems which have applications in molecular biology. The problem is NP-hard; it does not admit a constant-ratio approximation either, unless P = NP. Therefore, in addition to exact and approximate algorithms, (meta)heuristic algorithms have been proposed for the problem in recent years. On the other hand, in the recent years, hybrid algorithms have been proposed and successfully used for many hard problems in a variety of domains. In this paper, a new metaheuristic algorithm, called Constructive Beam and Local Search (CBLS), is investigated for the problem, which is a hybridization of constructive beam search and local search algorithms. More specifically, the proposed algorithm consists of two phases, the first phase is to obtain several candidate solutions via the constructive beam search and the second phase is to apply local search to the candidate solutions obtained by the first phase. The best solution found is returned as the final solution to the problem. The proposed algorithm is also similar to memetic algorithms in the sense that both use local search to further improve individual solutions. The CBLS algorithm is compared with the most recent published algorithm for the problem, GRASP, with significantly positive results; the improvement is by order of magnitudes in most cases.

Keywords: Bioinformatics, Far From Most Strings Problem, Hybrid metaheuristics, Matheuristics, Sequences consensus problems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1703
3957 Sentiment Analysis of Fake Health News Using Naive Bayes Classification Models

Authors: Danielle Shackley, Yetunde Folajimi

Abstract:

As more people turn to the internet seeking health related information, there is more risk of finding false, inaccurate, or dangerous information. Sentiment analysis is a natural language processing technique that assigns polarity scores of text, ranging from positive, neutral and negative. In this research, we evaluate the weight of a sentiment analysis feature added to fake health news classification models. The dataset consists of existing reliably labeled health article headlines that were supplemented with health information collected about COVID-19 from social media sources. We started with data preprocessing, tested out various vectorization methods such as Count and TFIDF vectorization. We implemented 3 Naive Bayes classifier models, including Bernoulli, Multinomial and Complement. To test the weight of the sentiment analysis feature on the dataset, we created benchmark Naive Bayes classification models without sentiment analysis, and those same models were reproduced and the feature was added. We evaluated using the precision and accuracy scores. The Bernoulli initial model performed with 90% precision and 75.2% accuracy, while the model supplemented with sentiment labels performed with 90.4% precision and stayed constant at 75.2% accuracy. Our results show that the addition of sentiment analysis did not improve model precision by a wide margin; while there was no evidence of improvement in accuracy, we had a 1.9% improvement margin of the precision score with the Complement model. Future expansion of this work could include replicating the experiment process, and substituting the Naive Bayes for a deep learning neural network model.

Keywords: Sentiment analysis, Naive Bayes model, natural language processing, topic analysis, fake health news classification model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 414
3956 Using Tabu Search to Analyze the Mauritian Economic Sectors

Authors: J. Cheeneebash, V. Beeharry, A. Gopaul

Abstract:

The aim of this paper is to express the input-output matrix as a linear ordering problem which is classified as an NP-hard problem. We then use a Tabu search algorithm to find the best permutation among sectors in the input-output matrix that will give an optimal solution. This optimal permutation can be useful in designing policies and strategies for economists and government in their goal of maximizing the gross domestic product.

Keywords: Input-Output matrix, linear ordering problem, Tabusearch.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1469