Search results for: feature collection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4449

Search results for: feature collection

4449 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 669
4448 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 497
4447 Hybrid Feature Selection Method for Sentiment Classification of Movie Reviews

Authors: Vishnu Goyal, Basant Agarwal

Abstract:

Sentiment analysis research provides methods for identifying the people’s opinion written in blogs, reviews, social networking websites etc. Sentiment analysis is to understand what opinion people have about any given entity, object or thing. Sentiment analysis research can be broadly categorised into three types of approaches i.e. semantic orientation, machine learning and lexicon based approaches. Feature selection methods improve the performance of the machine learning algorithms by eliminating the irrelevant features. Information gain feature selection method has been considered best method for sentiment analysis; however, it has the drawback of selection of threshold. Therefore, in this paper, we propose a hybrid feature selection methods comprising of information gain and proposed feature selection method. Initially, features are selected using Information Gain (IG) and further more noisy features are eliminated using the proposed feature selection method. Experimental results show the efficiency of the proposed feature selection methods.

Keywords: feature selection, sentiment analysis, hybrid feature selection

Procedia PDF Downloads 341
4446 Pantograph-Catenary Contact Force: Features Evaluation for Catenary Diagnostics

Authors: Mehdi Brahimi, Kamal Medjaher, Noureddine Zerhouni, Mohammed Leouatni

Abstract:

The Prognostics and Health Management is a system engineering discipline which provides solutions and models to the implantation of a predictive maintenance. The approach is based on extracting useful information from monitoring data to assess the “health” state of an industrial equipment or an asset. In this paper, we examine multiple extracted features from Pantograph-Catenary contact force in order to select the most relevant ones to achieve a diagnostics function. The feature extraction methodology is based on simulation data generated thanks to a Pantograph-Catenary simulation software called INPAC and measurement data. The feature extraction method is based on both statistical and signal processing analyses. The feature selection method is based on statistical criteria.

Keywords: catenary/pantograph interaction, diagnostics, Prognostics and Health Management (PHM), quality of current collection

Procedia PDF Downloads 290
4445 Feature Location Restoration for Under-Sampled Photoplethysmogram Using Spline Interpolation

Authors: Hangsik Shin

Abstract:

The purpose of this research is to restore the feature location of under-sampled photoplethysmogram using spline interpolation and to investigate feasibility for feature shape restoration. We obtained 10 kHz-sampled photoplethysmogram and decimated it to generate under-sampled dataset. Decimated dataset has 5 kHz, 2.5 k Hz, 1 kHz, 500 Hz, 250 Hz, 25 Hz and 10 Hz sampling frequency. To investigate the restoration performance, we interpolated under-sampled signals with 10 kHz, then compared feature locations with feature locations of 10 kHz sampled photoplethysmogram. Features were upper and lower peak of photplethysmography waveform. Result showed that time differences were dramatically decreased by interpolation. Location error was lesser than 1 ms in both feature types. In 10 Hz sampled cases, location error was also deceased a lot, however, they were still over 10 ms.

Keywords: peak detection, photoplethysmography, sampling, signal reconstruction

Procedia PDF Downloads 368
4444 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched the direction to the digital world. The domain of politics as one of the hottest topics of opinion mining research merged together with the behavior analysis for affiliation determination in text which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 are constituted by Linguistic Inquiry and Word Count (LIWC) features are tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that Decision Tree, Rule Induction and M5 Rule classifiers when used with SVM and IGR feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “function” as an aggregate feature of the linguistic category, is obtained as the most differentiating feature among the 68 features with 81% accuracy by itself in classifying articles either as Republican or Democrat.

Keywords: feature selection, LIWC, machine learning, politics

Procedia PDF Downloads 383
4443 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 341
4442 K-Means Clustering-Based Infinite Feature Selection Method

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Abstract:

Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.

Keywords: feature selection, infinite feature selection, clustering, graph

Procedia PDF Downloads 128
4441 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 339
4440 Product Feature Modelling for Integrating Product Design and Assembly Process Planning

Authors: Baha Hasan, Jan Wikander

Abstract:

This paper describes a part of the integrating work between assembly design and assembly process planning domains (APP). The work is based, in its first stage, on modelling assembly features to support APP. A multi-layer architecture, based on feature-based modelling, is proposed to establish a dynamic and adaptable link between product design using CAD tools and APP. The proposed approach is based on deriving “specific function” features from the “generic” assembly and form features extracted from the CAD tools. A hierarchal structure from “generic” to “specific” and from “high level geometrical entities” to “low level geometrical entities” is proposed in order to integrate geometrical and assembly data extracted from geometrical and assembly modelers to the required processes and resources in APP. The feature concept, feature-based modelling, and feature recognition techniques are reviewed.

Keywords: assembly feature, assembly process planning, feature, feature-based modelling, form feature, ontology

Procedia PDF Downloads 310
4439 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 362
4438 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 326
4437 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 346
4436 Triangular Geometric Feature for Offline Signature Verification

Authors: Zuraidasahana Zulkarnain, Mohd Shafry Mohd Rahim, Nor Anita Fairos Ismail, Mohd Azhar M. Arsad

Abstract:

Handwritten signature is accepted widely as a biometric characteristic for personal authentication. The use of appropriate features plays an important role in determining accuracy of signature verification; therefore, this paper presents a feature based on the geometrical concept. To achieve the aim, triangle attributes are exploited to design a new feature since the triangle possesses orientation, angle and transformation that would improve accuracy. The proposed feature uses triangulation geometric set comprising of sides, angles and perimeter of a triangle which is derived from the center of gravity of a signature image. For classification purpose, Euclidean classifier along with Voting-based classifier is used to verify the tendency of forgery signature. This classification process is experimented using triangular geometric feature and selected global features. Based on an experiment that was validated using Grupo de Senales 960 (GPDS-960) signature database, the proposed triangular geometric feature achieves a lower Average Error Rates (AER) value with a percentage of 34% as compared to 43% of the selected global feature. As a conclusion, the proposed triangular geometric feature proves to be a more reliable feature for accurate signature verification.

Keywords: biometrics, euclidean classifier, features extraction, offline signature verification, voting-based classifier

Procedia PDF Downloads 379
4435 Towards Integrating Statistical Color Features for Human Skin Detection

Authors: Mohd Zamri Osman, Mohd Aizaini Maarof, Mohd Foad Rohani

Abstract:

Human skin detection recognized as the primary step in most of the applications such as face detection, illicit image filtering, hand recognition and video surveillance. The performance of any skin detection applications greatly relies on the two components: feature extraction and classification method. Skin color is the most vital information used for skin detection purpose. However, color feature alone sometimes could not handle images with having same color distribution with skin color. A color feature of pixel-based does not eliminate the skin-like color due to the intensity of skin and skin-like color fall under the same distribution. Hence, the statistical color analysis will be exploited such mean and standard deviation as an additional feature to increase the reliability of skin detector. In this paper, we studied the effectiveness of statistical color feature for human skin detection. Furthermore, the paper analyzed the integrated color and texture using eight classifiers with three color spaces of RGB, YCbCr, and HSV. The experimental results show that the integrating statistical feature using Random Forest classifier achieved a significant performance with an F1-score 0.969.

Keywords: color space, neural network, random forest, skin detection, statistical feature

Procedia PDF Downloads 462
4434 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 460
4433 A Research and Application of Feature Selection Based on IWO and Tabu Search

Authors: Laicheng Cao, Xiangqian Su, Youxiao Wu

Abstract:

Feature selection is one of the important problems in network security, pattern recognition, data mining and other fields. In order to remove redundant features, effectively improve the detection speed of intrusion detection system, proposes a new feature selection method, which is based on the invasive weed optimization (IWO) algorithm and tabu search algorithm(TS). Use IWO as a global search, tabu search algorithm for local search, to improve the results of IWO algorithm. The experimental results show that the feature selection method can effectively remove the redundant features of network data information in feature selection, reduction time, and to guarantee accurate detection rate, effectively improve the speed of detection system.

Keywords: intrusion detection, feature selection, iwo, tabu search

Procedia PDF Downloads 530
4432 Comparative Analysis of Feature Extraction and Classification Techniques

Authors: R. L. Ujjwal, Abhishek Jain

Abstract:

In the field of computer vision, most facial variations such as identity, expression, emotions and gender have been extensively studied. Automatic age estimation has been rarely explored. With age progression of a human, the features of the face changes. This paper is providing a new comparable study of different type of algorithm to feature extraction [Hybrid features using HAAR cascade & HOG features] & classification [KNN & SVM] training dataset. By using these algorithms we are trying to find out one of the best classification algorithms. Same thing we have done on the feature selection part, we extract the feature by using HAAR cascade and HOG. This work will be done in context of age group classification model.

Keywords: computer vision, age group, face detection

Procedia PDF Downloads 370
4431 Automatic Threshold Search for Heat Map Based Feature Selection: A Cancer Dataset Analysis

Authors: Carlos Huertas, Reyes Juarez-Ramirez

Abstract:

Public health is one of the most critical issues today; therefore, there is great interest to improve technologies in the area of diseases detection. With machine learning and feature selection, it has been possible to aid the diagnosis of several diseases such as cancer. In this work, we present an extension to the Heat Map Based Feature Selection algorithm, this modification allows automatic threshold parameter selection that helps to improve the generalization performance of high dimensional data such as mass spectrometry. We have performed a comparison analysis using multiple cancer datasets and compare against the well known Recursive Feature Elimination algorithm and our original proposal, the results show improved classification performance that is very competitive against current techniques.

Keywords: biomarker discovery, cancer, feature selection, mass spectrometry

Procedia PDF Downloads 340
4430 The Wear Recognition on Guide Surface Based on the Feature of Radar Graph

Authors: Youhang Zhou, Weimin Zeng, Qi Xie

Abstract:

Abstract: In order to solve the wear recognition problem of the machine tool guide surface, a new machine tool guide surface recognition method based on the radar-graph barycentre feature is presented in this paper. Firstly, the gray mean value, skewness, projection variance, flat degrees and kurtosis features of the guide surface image data are defined as primary characteristics. Secondly, data Visualization technology based on radar graph is used. The visual barycentre graphical feature is demonstrated based on the radar plot of multi-dimensional data. Thirdly, a classifier based on the support vector machine technology is used, the radar-graph barycentre feature and wear original feature are put into the classifier separately for classification and comparative analysis of classification and experiment results. The calculation and experimental results show that the method based on the radar-graph barycentre feature can detect the guide surface effectively.

Keywords: guide surface, wear defects, feature extraction, data visualization

Procedia PDF Downloads 519
4429 Face Recognition Using Discrete Orthogonal Hahn Moments

Authors: Fatima Akhmedova, Simon Liao

Abstract:

One of the most critical decision points in the design of a face recognition system is the choice of an appropriate face representation. Effective feature descriptors are expected to convey sufficient, invariant and non-redundant facial information. In this work, we propose a set of Hahn moments as a new approach for feature description. Hahn moments have been widely used in image analysis due to their invariance, non-redundancy and the ability to extract features either globally and locally. To assess the applicability of Hahn moments to Face Recognition we conduct two experiments on the Olivetti Research Laboratory (ORL) database and University of Notre-Dame (UND) X1 biometric collection. Fusion of the global features along with the features from local facial regions are used as an input for the conventional k-NN classifier. The method reaches an accuracy of 93% of correctly recognized subjects for the ORL database and 94% for the UND database.

Keywords: face recognition, Hahn moments, recognition-by-parts, time-lapse

Procedia PDF Downloads 377
4428 Exploring Multi-Feature Based Action Recognition Using Multi-Dimensional Dynamic Time Warping

Authors: Guoliang Lu, Changhou Lu, Xueyong Li

Abstract:

In action recognition, previous studies have demonstrated the effectiveness of using multiple features to improve the recognition performance. We focus on two practical issues: i) most studies use a direct way of concatenating/accumulating multi features to evaluate the similarity between two actions. This way could be too strong since each kind of feature can include different dimensions, quantities, etc; ii) in many studies, the employed classification methods lack of a flexible and effective mechanism to add new feature(s) into classification. In this paper, we explore an unified scheme based on recently-proposed multi-dimensional dynamic time warping (MD-DTW). Experiments demonstrated the scheme's effectiveness of combining multi-feature and the flexibility of adding new feature(s) to increase the recognition performance. In addition, the explored scheme also provides us an open architecture for using new advanced classification methods in the future to enhance action recognition.

Keywords: action recognition, multi features, dynamic time warping, feature combination

Procedia PDF Downloads 437
4427 Conceptual Model of a Residential Waste Collection System Using ARENA Software

Authors: Bruce G. Wilson

Abstract:

The collection of municipal solid waste at the curbside is a complex operation that is repeated daily under varying circumstances around the world. There have been several attempts to develop Monte Carlo simulation models of the waste collection process dating back almost 50 years. Despite this long history, the use of simulation modeling as a planning or optimization tool for waste collection is still extremely limited in practice. Historically, simulation modeling of waste collection systems has been hampered by the limitations of computer hardware and software and by the availability of representative input data. This paper outlines the development of a Monte Carlo simulation model that overcomes many of the limitations contained in previous models. The model uses a general purpose simulation software program that is easily capable of modeling an entire waste collection network. The model treats the stops on a waste collection route as a queue of work to be processed by a collection vehicle (or server). Input data can be collected from a variety of sources including municipal geographic information systems, global positioning system recorders on collection vehicles, and weigh scales at transfer stations or treatment facilities. The result is a flexible model that is sufficiently robust that it can model the collection activities in a large municipality, while providing the flexibility to adapt to changing conditions on the collection route.

Keywords: modeling, queues, residential waste collection, Monte Carlo simulation

Procedia PDF Downloads 401
4426 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 285
4425 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 436
4424 The Perspective on Data Collection Instruments for Younger Learners

Authors: Hatice Kübra Koç

Abstract:

For academia, collecting reliable and valid data is one of the most significant issues for researchers. However, it is not the same procedure for all different target groups; meanwhile, during data collection from teenagers, young adults, or adults, researchers can use common data collection tools such as questionnaires, interviews, and semi-structured interviews; yet, for young learners and very young ones, these reliable and valid data collection tools cannot be easily designed or applied by the researchers. In this study, firstly, common data collection tools are examined for ‘very young’ and ‘young learners’ participant groups since it is thought that the quality and efficiency of an academic study is mainly based on its valid and correct data collection and data analysis procedure. Secondly, two different data collection instruments for very young and young learners are stated as discussing the efficacy of them. Finally, a suggested data collection tool – a performance-based questionnaire- which is specifically developed for ‘very young’ and ‘young learners’ participant groups in the field of teaching English to young learners as a foreign language is presented in this current study. The designing procedure and suggested items/factors for the suggested data collection tool are accordingly revealed at the end of the study to help researchers have studied with young and very learners.

Keywords: data collection instruments, performance-based questionnaire, young learners, very young learners

Procedia PDF Downloads 94
4423 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network

Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar

Abstract:

Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.

Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network

Procedia PDF Downloads 518
4422 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 574
4421 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 500
4420 A Feature Clustering-Based Sequential Selection Approach for Color Texture Classification

Authors: Mohamed Alimoussa, Alice Porebski, Nicolas Vandenbroucke, Rachid Oulad Haj Thami, Sana El Fkihi

Abstract:

Color and texture are highly discriminant visual cues that provide an essential information in many types of images. Color texture representation and classification is therefore one of the most challenging problems in computer vision and image processing applications. Color textures can be represented in different color spaces by using multiple image descriptors which generate a high dimensional set of texture features. In order to reduce the dimensionality of the feature set, feature selection techniques can be used. The goal of feature selection is to find a relevant subset from an original feature space that can improve the accuracy and efficiency of a classification algorithm. Traditionally, feature selection is focused on removing irrelevant features, neglecting the possible redundancy between relevant ones. This is why some feature selection approaches prefer to use feature clustering analysis to aid and guide the search. These techniques can be divided into two categories. i) Feature clustering-based ranking algorithm uses feature clustering as an analysis that comes before feature ranking. Indeed, after dividing the feature set into groups, these approaches perform a feature ranking in order to select the most discriminant feature of each group. ii) Feature clustering-based subset search algorithms can use feature clustering following one of three strategies; as an initial step that comes before the search, binded and combined with the search or as the search alternative and replacement. In this paper, we propose a new feature clustering-based sequential selection approach for the purpose of color texture representation and classification. Our approach is a three step algorithm. First, irrelevant features are removed from the feature set thanks to a class-correlation measure. Then, introducing a new automatic feature clustering algorithm, the feature set is divided into several feature clusters. Finally, a sequential search algorithm, based on a filter model and a separability measure, builds a relevant and non redundant feature subset: at each step, a feature is selected and features of the same cluster are removed and thus not considered thereafter. This allows to significantly speed up the selection process since large number of redundant features are eliminated at each step. The proposed algorithm uses the clustering algorithm binded and combined with the search. Experiments using a combination of two well known texture descriptors, namely Haralick features extracted from Reduced Size Chromatic Co-occurence Matrices (RSCCMs) and features extracted from Local Binary patterns (LBP) image histograms, on five color texture data sets, Outex, NewBarktex, Parquet, Stex and USPtex demonstrate the efficiency of our method compared to seven of the state of the art methods in terms of accuracy and computation time.

Keywords: feature selection, color texture classification, feature clustering, color LBP, chromatic cooccurrence matrix

Procedia PDF Downloads 138