Search results for: Subset Selection.
1103 Feature Subset Selection Using Ant Colony Optimization
Authors: Ahmed Al-Ani
Abstract:
Feature selection is an important step in many pattern classification problems. It is applied to select a subset of features, from a much larger set, such that the selected subset is sufficient to perform the classification task. Due to its importance, the problem of feature selection has been investigated by many researchers. In this paper, a novel feature subset search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.Keywords: Ant Colony Optimization, ant systems, feature selection, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16011102 An Optimal Feature Subset Selection for Leaf Analysis
Authors: N. Valliammal, S.N. Geethalakshmi
Abstract:
This paper describes an optimal approach for feature subset selection to classify the leaves based on Genetic Algorithm (GA) and Kernel Based Principle Component Analysis (KPCA). Due to high complexity in the selection of the optimal features, the classification has become a critical task to analyse the leaf image data. Initially the shape, texture and colour features are extracted from the leaf images. These extracted features are optimized through the separate functioning of GA and KPCA. This approach performs an intersection operation over the subsets obtained from the optimization process. Finally, the most common matching subset is forwarded to train the Support Vector Machine (SVM). Our experimental results successfully prove that the application of GA and KPCA for feature subset selection using SVM as a classifier is computationally effective and improves the accuracy of the classifier.Keywords: Optimization, Feature extraction, Feature subset, Classification, GA, KPCA, SVM and Computation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22411101 Application of Genetic Algorithms to Feature Subset Selection in a Farsi OCR
Authors: M. Soryani, N. Rafat
Abstract:
Dealing with hundreds of features in character recognition systems is not unusual. This large number of features leads to the increase of computational workload of recognition process. There have been many methods which try to remove unnecessary or redundant features and reduce feature dimensionality. Besides because of the characteristics of Farsi scripts, it-s not possible to apply other languages algorithms to Farsi directly. In this paper some methods for feature subset selection using genetic algorithms are applied on a Farsi optical character recognition (OCR) system. Experimental results show that application of genetic algorithms (GA) to feature subset selection in a Farsi OCR results in lower computational complexity and enhanced recognition rate.Keywords: Feature Subset Selection, Genetic Algorithms, Optical Character Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19771100 A Hybrid Feature Subset Selection Approach based on SVM and Binary ACO. Application to Industrial Diagnosis
Authors: O. Kadri, M. D. Mouss, L.H. Mouss, F. Merah
Abstract:
This paper proposes a novel hybrid algorithm for feature selection based on a binary ant colony and SVM. The final subset selection is attained through the elimination of the features that produce noise or, are strictly correlated with other already selected features. Our algorithm can improve classification accuracy with a small and appropriate feature subset. Proposed algorithm is easily implemented and because of use of a simple filter in that, its computational complexity is very low. The performance of the proposed algorithm is evaluated through a real Rotary Cement kiln dataset. The results show that our algorithm outperforms existing algorithms.
Keywords: Binary Ant Colony algorithm, Support VectorMachine, feature selection, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16081099 Ant Colony Optimization for Feature Subset Selection
Authors: Ahmed Al-Ani
Abstract:
The Ant Colony Optimization (ACO) is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It has recently attracted a lot of attention and has been successfully applied to a number of different optimization problems. Due to the importance of the feature selection problem and the potential of ACO, this paper presents a novel method that utilizes the ACO algorithm to implement a feature subset search procedure. Initial results obtained using the classification of speech segments are very promising.Keywords: Ant Colony Optimization, ant systems, feature selection, pattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31411098 Ensembling Adaptively Constructed Polynomial Regression Models
Authors: Gints Jekabsons
Abstract:
The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16721097 A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques
Authors: Amir-Massoud Bidgoli, Mehdi Naseri Parsa
Abstract:
In this paper a combined feature selection method is proposed which takes advantages of sample domain filtering, resampling and feature subset evaluation methods to reduce dimensions of huge datasets and select reliable features. This method utilizes both feature space and sample domain to improve the process of feature selection and uses a combination of Chi squared with Consistency attribute evaluation methods to seek reliable features. This method consists of two phases. The first phase filters and resamples the sample domain and the second phase adopts a hybrid procedure to find the optimal feature space by applying Chi squared, Consistency subset evaluation methods and genetic search. Experiments on various sized datasets from UCI Repository of Machine Learning databases show that the performance of five classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) improves simultaneously and the classification error for these classifiers decreases considerably. The experiments also show that this method outperforms other feature selection methods.Keywords: feature selection, resampling, reliable features, Consistency Subset Evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25821096 Slovenian Text-to-Speech Synthesis for Speech User Interfaces
Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden
Abstract:
The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.Keywords: text-to-speech synthesis, prosody modeling, speech user interface.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14551095 On an Open Problem for Definable Subsets of Covering Approximation Spaces
Authors: Mei He, Ying Ge, Jingyu Qian
Abstract:
Let (U;D) be a Gr-covering approximation space (U; C) with covering lower approximation operator D and covering upper approximation operator D. For a subset X of U, this paper investigates the following three conditions: (1) X is a definable subset of (U;D); (2) X is an inner definable subset of (U;D); (3) X is an outer definable subset of (U;D). It is proved that if one of the above three conditions holds, then the others hold. These results give a positive answer of an open problem for definable subsets of covering approximation spaces.Keywords: Covering approximation space, covering approximation operator, definable subset, inner definable subset, outer definable subset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11511094 Gene Selection Guided by Feature Interdependence
Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel
Abstract:
Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.
Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21441093 Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier
Authors: Khin May Win, Nan Sai Moon Kham
Abstract:
Identification of cancer genes that might anticipate the clinical behaviors from different types of cancer disease is challenging due to the huge number of genes and small number of patients samples. The new method is being proposed based on supervised learning of classification like support vector machines (SVMs).A new solution is described by the introduction of the Maximized Margin (MM) in the subset criterion, which permits to get near the least generalization error rate. In class prediction problem, gene selection is essential to improve the accuracy and to identify genes for cancer disease. The performance of the new method was evaluated with real-world data experiment. It can give the better accuracy for classification.Keywords: Microarray data, feature selection, recursive featureelimination, support vector machines.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15401092 Genetic Algorithm for Feature Subset Selection with Exploitation of Feature Correlations from Continuous Wavelet Transform: a real-case Application
Authors: G. Van Dijck, M. M. Van Hulle, M. Wevers
Abstract:
A genetic algorithm (GA) based feature subset selection algorithm is proposed in which the correlation structure of the features is exploited. The subset of features is validated according to the classification performance. Features derived from the continuous wavelet transform are potentially strongly correlated. GA-s that do not take the correlation structure of features into account are inefficient. The proposed algorithm forms clusters of correlated features and searches for a good candidate set of clusters. Secondly a search within the clusters is performed. Different simulations of the algorithm on a real-case data set with strong correlations between features show the increased classification performance. Comparison is performed with a standard GA without use of the correlation structure.Keywords: Classification, genetic algorithm, hierarchicalagglomerative clustering, wavelet transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12231091 An Adequate Choice of Initial Sample Size for Selection Approach
Authors: Mohammad H. Almomani, Rosmanjawati Abdul Rahman
Abstract:
In this paper, we consider the effect of the initial sample size on the performance of a sequential approach that used in selecting a good enough simulated system, when the number of alternatives is very large. We implement a sequential approach on M=M=1 queuing system under some parameter settings, with a different choice of the initial sample sizes to explore the impacts on the performance of this approach. The results show that the choice of the initial sample size does affect the performance of our selection approach.Keywords: Ranking and Selection, Ordinal Optimization, Optimal Computing Budget Allocation, Subset Selection, Indifference-Zone, Initial Sample Size.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13071090 Some Properties of Superfuzzy Subset of a Fuzzy Subset
Authors: Hassan Naraghi
Abstract:
In this paper, we define permutable and mutually permutable fuzzy subgroups of a group. Then we study their relation with permutable and mutually permutable subgroups of a group. Also we study some properties of fuzzy quasinormal subgroup. We define superfuzzy subset of a fuzzy subset and we study some properties of superfuzzy subset of a fuzzy subset.
Keywords: Permutable fuzzy subgroup, mutually permutable fuzzy subgroup, fuzzy quasinormal subgroup, superfuzzy subset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12031089 Predicting Bankruptcy using Tabu Search in the Mauritian Context
Authors: J. Cheeneebash, K. B. Lallmamode, A. Gopaul
Abstract:
Throughout this paper, a relatively new technique, the Tabu search variable selection model, is elaborated showing how it can be efficiently applied within the financial world whenever researchers come across the selection of a subset of variables from a whole set of descriptive variables under analysis. In the field of financial prediction, researchers often have to select a subset of variables from a larger set to solve different type of problems such as corporate bankruptcy prediction, personal bankruptcy prediction, mortgage, credit scoring and the Arbitrage Pricing Model (APM). Consequently, to demonstrate how the method operates and to illustrate its usefulness as well as its superiority compared to other commonly used methods, the Tabu search algorithm for variable selection is compared to two main alternative search procedures namely, the stepwise regression and the maximum R 2 improvement method. The Tabu search is then implemented in finance; where it attempts to predict corporate bankruptcy by selecting the most appropriate financial ratios and thus creating its own prediction score equation. In comparison to other methods, mostly the Altman Z-Score model, the Tabu search model produces a higher success rate in predicting correctly the failure of firms or the continuous running of existing entities.
Keywords: Predicting Bankruptcy, Tabu Search
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19371088 The Effect of Increment in Simulation Samples on a Combined Selection Procedure
Authors: Mohammad H. Almomani, Rosmanjawati Abdul Rahman
Abstract:
Statistical selection procedures are used to select the best simulated system from a finite set of alternatives. In this paper, we present a procedure that can be used to select the best system when the number of alternatives is large. The proposed procedure consists a combination between Ranking and Selection, and Ordinal Optimization procedures. In order to improve the performance of Ordinal Optimization, Optimal Computing Budget Allocation technique is used to determine the best simulation lengths for all simulation systems and to reduce the total computation time. We also argue the effect of increment in simulation samples for the combined procedure. The results of numerical illustration show clearly the effect of increment in simulation samples on the proposed combination of selection procedure.Keywords: Indifference-Zone, Optimal Computing Budget Allocation, Ordinal Optimization, Ranking and Selection, Subset Selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12411087 Regression Test Selection Technique for Multi-Programming Language
Authors: Walid S. Abd El-hamid, Sherif S. El-Etriby, Mohiy M. Hadhoud
Abstract:
Regression testing is a maintenance activity applied to modified software to provide confidence that the changed parts are correct and that the unchanged parts have not been adversely affected by the modifications. Regression test selection techniques reduce the cost of regression testing, by selecting a subset of an existing test suite to use in retesting modified programs. This paper presents the first general regression-test-selection technique, which based on code and allows selecting test cases for any programs written in any programming language. Then it handles incomplete program. We also describe RTSDiff, a regression-test-selection system that implements the proposed technique. The results of the empirical studied that performed in four programming languages java, C#, Cµ and Visual basic show that the efficiency and effective in reducing the size of test suit.Keywords: Regression testing, testing, test selection, softwareevolution, software maintenance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15321086 A Novel Prediction Method for Tag SNP Selection using Genetic Algorithm based on KNN
Authors: Li-Yeh Chuang, Yu-Jen Hou, Jr., Cheng-Hong Yang
Abstract:
Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, research is limited by the cost of genotyping the tremendous number of SNPs. Therefore, it is important to identify a small subset of informative SNPs, the so-called tag SNPs. This subset consists of selected SNPs of the genotypes, and accurately represents the rest of the SNPs. Furthermore, an effective evaluation method is needed to evaluate prediction accuracy of a set of tag SNPs. In this paper, a genetic algorithm (GA) is applied to tag SNP problems, and the K-nearest neighbor (K-NN) serves as a prediction method of tag SNP selection. The experimental data used was taken from the HapMap project; it consists of genotype data rather than haplotype data. The proposed method consistently identified tag SNPs with considerably better prediction accuracy than methods from the literature. At the same time, the number of tag SNPs identified was smaller than the number of tag SNPs in the other methods. The run time of the proposed method was much shorter than the run time of the SVM/STSA method when the same accuracy was reached.
Keywords: Genetic Algorithm (GA), Genotype, Single nucleotide polymorphism (SNP), tag SNPs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17701085 Feature Selection for Breast Cancer Diagnosis: A Case-Based Wrapper Approach
Authors: Mohammad Darzi, Ali AsgharLiaei, Mahdi Hosseini, HabibollahAsghari
Abstract:
This article addresses feature selection for breast cancer diagnosis. The present process contains a wrapper approach based on Genetic Algorithm (GA) and case-based reasoning (CBR). GA is used for searching the problem space to find all of the possible subsets of features and CBR is employed to estimate the evaluation result of each subset. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer (WDBC) dataset.Keywords: Case-based reasoning; Breast cancer diagnosis; Genetic algorithm; Wrapper feature selection
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28741084 Equivalence Class Subset Algorithm
Authors: Jeffrey L. Duffany
Abstract:
The equivalence class subset algorithm is a powerful tool for solving a wide variety of constraint satisfaction problems and is based on the use of a decision function which has a very high but not perfect accuracy. Perfect accuracy is not required in the decision function as even a suboptimal solution contains valuable information that can be used to help find an optimal solution. In the hardest problems, the decision function can break down leading to a suboptimal solution where there are more equivalence classes than are necessary and which can be viewed as a mixture of good decision and bad decisions. By choosing a subset of the decisions made in reaching a suboptimal solution an iterative technique can lead to an optimal solution, using series of steadily improved suboptimal solutions. The goal is to reach an optimal solution as quickly as possible. Various techniques for choosing the decision subset are evaluated.Keywords: np-complete, complexity, algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13631083 Definable Subsets in Covering Approximation Spaces
Authors: Xun Ge, Zhaowen Li
Abstract:
Covering approximation spaces is a class of important generalization of approximation spaces. For a subset X of a covering approximation space (U, C), is X definable or rough? The answer of this question is uncertain, which depends on covering approximation operators endowed on (U, C). Note that there are many various covering approximation operators, which can be endowed on covering approximation spaces. This paper investigates covering approximation spaces endowed ten covering approximation operators respectively, and establishes some relations among definable subsets, inner definable subsets and outer definable subsets in covering approximation spaces, which deepens some results on definable subsets in approximation spaces.Keywords: Covering approximation space, covering approximation operator, definable subset, inner definable subset, outer definable subset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12851082 Genetic Algorithms and Kernel Matrix-based Criteria Combined Approach to Perform Feature and Model Selection for Support Vector Machines
Authors: A. Perolini
Abstract:
Feature and model selection are in the center of attention of many researches because of their impact on classifiers- performance. Both selections are usually performed separately but recent developments suggest using a combined GA-SVM approach to perform them simultaneously. This approach improves the performance of the classifier identifying the best subset of variables and the optimal parameters- values. Although GA-SVM is an effective method it is computationally expensive, thus a rough method can be considered. The paper investigates a joined approach of Genetic Algorithm and kernel matrix criteria to perform simultaneously feature and model selection for SVM classification problem. The purpose of this research is to improve the classification performance of SVM through an efficient approach, the Kernel Matrix Genetic Algorithm method (KMGA).Keywords: Feature and model selection, Genetic Algorithms, Support Vector Machines, kernel matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15961081 A Fuzzy-Rough Feature Selection Based on Binary Shuffled Frog Leaping Algorithm
Authors: Javad Rahimipour Anaraki, Saeed Samet, Mahdi Eftekhari, Chang Wook Ahn
Abstract:
Feature selection and attribute reduction are crucial problems, and widely used techniques in the field of machine learning, data mining and pattern recognition to overcome the well-known phenomenon of the Curse of Dimensionality. This paper presents a feature selection method that efficiently carries out attribute reduction, thereby selecting the most informative features of a dataset. It consists of two components: 1) a measure for feature subset evaluation, and 2) a search strategy. For the evaluation measure, we have employed the fuzzy-rough dependency degree (FRFDD) of the lower approximation-based fuzzy-rough feature selection (L-FRFS) due to its effectiveness in feature selection. As for the search strategy, a modified version of a binary shuffled frog leaping algorithm is proposed (B-SFLA). The proposed feature selection method is obtained by hybridizing the B-SFLA with the FRDD. Nine classifiers have been employed to compare the proposed approach with several existing methods over twenty two datasets, including nine high dimensional and large ones, from the UCI repository. The experimental results demonstrate that the B-SFLA approach significantly outperforms other metaheuristic methods in terms of the number of selected features and the classification accuracy.Keywords: Binary shuffled frog leaping algorithm, feature selection, fuzzy-rough set, minimal reduct.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7311080 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule
Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu
Abstract:
Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.Keywords: Instance selection, data reduction, MapReduce, kNN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10161079 Timescape-Based Panoramic View for Historic Landmarks
Authors: H. Ali, A. Whitehead
Abstract:
Providing a panoramic view of famous landmarks around the world offers artistic and historic value for historians, tourists, and researchers. Exploring the history of famous landmarks by presenting a comprehensive view of a temporal panorama merged with geographical and historical information presents a unique challenge of dealing with images that span a long period, from the 1800’s up to the present. This work presents the concept of temporal panorama through a timeline display of aligned historic and modern images for many famous landmarks. Utilization of this panorama requires a collection of hundreds of thousands of landmark images from the Internet comprised of historic images and modern images of the digital age. These images have to be classified for subset selection to keep the more suitable images that chronologically document a landmark’s history. Processing of historic images captured using older analog technology under various different capturing conditions represents a big challenge when they have to be used with modern digital images. Successful processing of historic images to prepare them for next steps of temporal panorama creation represents an active contribution in cultural heritage preservation through the fulfillment of one of UNESCO goals in preservation and displaying famous worldwide landmarks.
Keywords: Cultural heritage, image registration, image subset selection, registered image similarity, temporal panorama, timescapes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10511078 Multiclass Support Vector Machines with Simultaneous Multi-Factors Optimization for Corporate Credit Ratings
Authors: Hyunchul Ahn, William X. S. Wong
Abstract:
Corporate credit rating prediction is one of the most important topics, which has been studied by researchers in the last decade. Over the last decade, researchers are pushing the limit to enhance the exactness of the corporate credit rating prediction model by applying several data-driven tools including statistical and artificial intelligence methods. Among them, multiclass support vector machine (MSVM) has been widely applied due to its good predictability. However, heuristics, for example, parameters of a kernel function, appropriate feature and instance subset, has become the main reason for the critics on MSVM, as they have dictate the MSVM architectural variables. This study presents a hybrid MSVM model that is intended to optimize all the parameter such as feature selection, instance selection, and kernel parameter. Our model adopts genetic algorithm (GA) to simultaneously optimize multiple heterogeneous design factors of MSVM.
Keywords: Corporate credit rating prediction, feature selection, genetic algorithms, instance selection, multiclass support vector machines.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14111077 A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data
Authors: Rameswar Debnath, Haruhisa Takahashi
Abstract:
An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.Keywords: support vector machine, generalization error-bound, feature selection, evolutionary algorithm, microarray data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15351076 Using PFA in Feature Analysis and Selection for H.264 Adaptation
Authors: Nora A. Naguib, Ahmed E. Hussein, Hesham A. Keshk, Mohamed I. El-Adawy
Abstract:
Classification of video sequences based on their contents is a vital process for adaptation techniques. It helps decide which adaptation technique best fits the resource reduction requested by the client. In this paper we used the principal feature analysis algorithm to select a reduced subset of video features. The main idea is to select only one feature from each class based on the similarities between the features within that class. Our results showed that using this feature reduction technique the source video features can be completely omitted from future classification of video sequences.
Keywords: Adaptation, feature selection, H.264, Principal Feature Analysis (PFA)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16061075 Improving Fake News Detection Using K-means and Support Vector Machine Approaches
Authors: Kasra Majbouri Yazdi, Adel Majbouri Yazdi, Saeid Khodayi, Jingyu Hou, Wanlei Zhou, Saeed Saedy
Abstract:
Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.
Keywords: Fake news detection, feature selection, support vector machine, K-means clustering, machine learning, social media.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 45241074 A New DIDS Design Based on a Combination Feature Selection Approach
Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman
Abstract:
Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original dataset. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 dataset is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.Keywords: Distributed intrusion detection system, mobile agent, feature selection, Bees Algorithm, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939