Search results for: hierarchical classification.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1323

Search results for: hierarchical classification.

813 Correlation-based Feature Selection using Ant Colony Optimization

Authors: M. Sadeghzadeh, M. Teshnehlab

Abstract:

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Keywords: Ant colony optimization, Classification, Datamining, Feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2420
812 Radar Hydrology: New Z/R Relationships for Klang River Basin Malaysia based on Rainfall Classification

Authors: R. Suzana, T. Wardah, A.B. Sahol Hamid

Abstract:

The use of radar in Quantitative Precipitation Estimation (QPE) for radar-rainfall measurement is significantly beneficial. Radar has advantages in terms of high spatial and temporal condition in rainfall measurement and also forecasting. In Malaysia, radar application in QPE is still new and needs to be explored. This paper focuses on the Z/R derivation works of radarrainfall estimation based on rainfall classification. The works developed new Z/R relationships for Klang River Basin in Selangor area for three different general classes of rain events, namely low (<10mm/hr), moderate (>10mm/hr, <30mm/hr) and heavy (>30mm/hr) and also on more specific rain types during monsoon seasons. Looking at the high potential of Doppler radar in QPE, the newly formulated Z/R equations will be useful in improving the measurement of rainfall for any hydrological application, especially for flood forecasting.

Keywords: Radar, Quantitative Precipitation Estimation, Z/R development, flood forecasting

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2151
811 Classification of Acoustic Emission Based Partial Discharge in Oil Pressboard Insulation System Using Wavelet Analysis

Authors: Prasanta Kundu, N.K. Kishore, A.K. Sinha

Abstract:

Insulation used in transformer is mostly oil pressboard insulation. Insulation failure is one of the major causes of catastrophic failure of transformers. It is established that partial discharges (PD) cause insulation degradation and premature failure of insulation. Online monitoring of PDs can reduce the risk of catastrophic failure of transformers. There are different techniques of partial discharge measurement like, electrical, optical, acoustic, opto-acoustic and ultra high frequency (UHF). Being non invasive and non interference prone, acoustic emission technique is advantageous for online PD measurement. Acoustic detection of p.d. is based on the retrieval and analysis of mechanical or pressure signals produced by partial discharges. Partial discharges are classified according to the origin of discharges. Their effects on insulation deterioration are different for different types. This paper reports experimental results and analysis for classification of partial discharges using acoustic emission signal of laboratory simulated partial discharges in oil pressboard insulation system using three different electrode systems. Acoustic emission signal produced by PD are detected by sensors mounted on the experimental tank surface, stored on an oscilloscope and fed to computer for further analysis. The measured AE signals are analyzed using discrete wavelet transform analysis and wavelet packet analysis. Energy distribution in different frequency bands of discrete wavelet decomposed signal and wavelet packet decomposed signal is calculated. These analyses show a distinct feature useful for PD classification. Wavelet packet analysis can sort out any misclassification arising out of DWT in most cases.

Keywords: Acoustic emission, discrete wavelet transform, partial discharge, wavelet packet analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2987
810 A Survey: Clustering Ensembles Techniques

Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha

Abstract:

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3449
809 Classification of Earthquake Distribution in the Banda Sea Collision Zone with Point Process Approach

Authors: Henry J. Wattimanela, Udjianna S. Pasaribu, Nanang T. Puspito, Sapto W. Indratno

Abstract:

Banda Sea Collision Zone (BSCZ) is the result of the interaction and convergence of Indo-Australian plate, Eurasian plate and Pacific plate. This location is located in eastern Indonesia. This zone has a very high seismic activity. In this research, we will calculate the rate (λ) and Mean Square Error (MSE). By this result, we will classification earthquakes distribution in the BSCZ with the point process approach. Chi-square is used to determine the type of earthquakes distribution in the sub region of BSCZ. The data used in this research is data of earthquakes with a magnitude ≥ 6 SR for the period 1964-2013 and sourced from BMKG Jakarta. This research is expected to contribute to the Moluccas Province and surrounding local governments in performing spatial plan document related to disaster management.

Keywords: Banda sea collision zone, earthquakes, mean square error, Poisson distribution, chi-square test.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117
808 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: Data mining, textile production, decision trees, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
807 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837
806 Fault Classification of Double Circuit Transmission Line Using Artificial Neural Network

Authors: Anamika Jain, A. S. Thoke, R. N. Patel

Abstract:

This paper addresses the problems encountered by conventional distance relays when protecting double-circuit transmission lines. The problems arise principally as a result of the mutual coupling between the two circuits under different fault conditions; this mutual coupling is highly nonlinear in nature. An adaptive protection scheme is proposed for such lines based on application of artificial neural network (ANN). ANN has the ability to classify the nonlinear relationship between measured signals by identifying different patterns of the associated signals. One of the key points of the present work is that only current signals measured at local end have been used to detect and classify the faults in the double circuit transmission line with double end infeed. The adaptive protection scheme is tested under a specific fault type, but varying fault location, fault resistance, fault inception angle and with remote end infeed. An improved performance is experienced once the neural network is trained adequately, which performs precisely when faced with different system parameters and conditions. The entire test results clearly show that the fault is detected and classified within a quarter cycle; thus the proposed adaptive protection technique is well suited for double circuit transmission line fault detection & classification. Results of performance studies show that the proposed neural network-based module can improve the performance of conventional fault selection algorithms.

Keywords: Double circuit transmission line, Fault detection and classification, High impedance fault and Artificial Neural Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3186
805 DIFFER: A Propositionalization approach for Learning from Structured Data

Authors: Thashmee Karunaratne, Henrik Böstrom

Abstract:

Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.

Keywords: Machine learning, Structure classification, Propositionalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1222
804 Classification System for a Collaborative Urban Retail Logistics

Authors: Volker Lange, Stephanie Moede, Christiane Auffermann

Abstract:

From an economic standpoint the current and future road traffic situation in urban areas is a cost factor. Traffic jams and congestion prolong journey times and tie up resources in trucks and personnel. Many discussions about imposing charges or tolls for cities in Europe in order to reduce traffic congestion are currently in progress. Both of these effects lead – directly or indirectly - to additional costs for the urban distribution systems in retail companies. One approach towards improving the efficiency of retail distribution systems, and thus towards avoiding negative environmental factors in urban areas, is horizontal collaboration for deliveries to retail outlets – Urban Retail Logistics. This paper presents a classification system to help reveal where cooperation between retail companies is possible and makes sense for deliveries to retail outlets in urban areas.

Keywords: City Logistics, Horizontal Collaboration, Urban Freight Transport, Urban Retail Logistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2358
803 The Imaging Methods for Classifying Crispiness of Freeze-Dried Durian using Fuzzy Logic

Authors: Sitthichon Kanitthakun, Pinit Kumhom, Kosin Chamnongthai

Abstract:

In quality control of freeze-dried durian, crispiness is a key quality index of the product. Generally, crispy testing has to be done by a destructive method. A nondestructive testing of the crispiness is required because the samples can be reused for other kinds of testing. This paper proposed a crispiness classification method of freeze-dried durians using fuzzy logic for decision making. The physical changes of a freeze-dried durian include the pores appearing in the images. Three physical features including (1) the diameters of pores, (2) the ratio of the pore area and the remaining area, and (3) the distribution of the pores are considered to contribute to the crispiness. The fuzzy logic is applied for making the decision. The experimental results comparing with food expert opinion showed that the accuracy of the proposed classification method is 83.33 percent.

Keywords: Durian, crispiness, freeze drying, pore, fuzzy logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1970
802 Wavelet Compression of ECG Signals Using SPIHT Algorithm

Authors: Mohammad Pooyan, Ali Taheri, Morteza Moazami-Goudarzi, Iman Saboori

Abstract:

In this paper we present a novel approach for wavelet compression of electrocardiogram (ECG) signals based on the set partitioning in hierarchical trees (SPIHT) coding algorithm. SPIHT algorithm has achieved prominent success in image compression. Here we use a modified version of SPIHT for one dimensional signals. We applied wavelet transform with SPIHT coding algorithm on different records of MIT-BIH database. The results show the high efficiency of this method in ECG compression.

Keywords: ECG compression, wavelet, SPIHT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2398
801 Modeling of Random Variable with Digital Probability Hyper Digraph: Data-Oriented Approach

Authors: A. Habibizad Navin, M. Naghian Fesharaki, M. Mirnia, M. Kargar

Abstract:

In this paper we introduce Digital Probability Hyper Digraph for modeling random variable as the hierarchical data-oriented model.

Keywords: Data-Oriented Models, Data Structure, DigitalProbability Hyper Digraph, Random Variable, Statistic andProbability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1273
800 Genetic Algorithms and Kernel Matrix-based Criteria Combined Approach to Perform Feature and Model Selection for Support Vector Machines

Authors: A. Perolini

Abstract:

Feature and model selection are in the center of attention of many researches because of their impact on classifiers- performance. Both selections are usually performed separately but recent developments suggest using a combined GA-SVM approach to perform them simultaneously. This approach improves the performance of the classifier identifying the best subset of variables and the optimal parameters- values. Although GA-SVM is an effective method it is computationally expensive, thus a rough method can be considered. The paper investigates a joined approach of Genetic Algorithm and kernel matrix criteria to perform simultaneously feature and model selection for SVM classification problem. The purpose of this research is to improve the classification performance of SVM through an efficient approach, the Kernel Matrix Genetic Algorithm method (KMGA).

Keywords: Feature and model selection, Genetic Algorithms, Support Vector Machines, kernel matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597
799 One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1989
798 Determining the Gender of Korean Names for Pronoun Generation

Authors: Seong-Bae Park, Hee-Geun Yoon

Abstract:

It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.

Keywords: machine translation, natural language processing, gender of proper nouns, statistical method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2368
797 Pattern Recognition as an Internalized Motor Programme

Authors: M. Jändel

Abstract:

A new conceptual architecture for low-level neural pattern recognition is presented. The key ideas are that the brain implements support vector machines and that support vectors are represented as memory patterns in competitive queuing memories. A binary classifier is built from two competitive queuing memories holding positive and negative valence training examples respectively. The support vector machine classification function is calculated in synchronized evaluation cycles. The kernel is computed by bisymmetric feed-forward networks feed by sensory input and by competitive queuing memories traversing the complete sequence of support vectors. Temporary summation generates the output classification. It is speculated that perception apparatus in the brain reuses structures that have evolved for enabling fluent execution of prepared action sequences so that pattern recognition is built on internalized motor programmes.

Keywords: Competitive queuing model, Olfactory system, Pattern recognition, Support vector machine, Thalamus

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1369
796 Integrated ACOR/IACOMV-R-SVM Algorithm

Authors: Hiba Basim Alwan, Ku Ruhana Ku-Mahamud

Abstract:

A direction for ACO is to optimize continuous and mixed (discrete and continuous) variables in solving problems with various types of data. Support Vector Machine (SVM), which originates from the statistical approach, is a present day classification technique. The main problems of SVM are selecting feature subset and tuning the parameters. Discretizing the continuous value of the parameters is the most common approach in tuning SVM parameters. This process will result in loss of information which affects the classification accuracy. This paper presents two algorithms that can simultaneously tune SVM parameters and select the feature subset. The first algorithm, ACOR-SVM, will tune SVM parameters, while the second IACOMV-R-SVM algorithm will simultaneously tune SVM parameters and select the feature subset. Three benchmark UCI datasets were used in the experiments to validate the performance of the proposed algorithms. The results show that the proposed algorithms have good performances as compared to other approaches.

Keywords: Continuous ant colony optimization, incremental continuous ant colony, simultaneous optimization, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 880
795 Analysis of the EEG Signal for a Practical Biometric System

Authors: Muhammad Kamil Abdullah, Khazaimatol S Subari, Justin Leo Cheang Loong, Nurul Nadia Ahmad

Abstract:

This paper discusses the effectiveness of the EEG signal for human identification using four or less of channels of two different types of EEG recordings. Studies have shown that the EEG signal has biometric potential because signal varies from person to person and impossible to replicate and steal. Data were collected from 10 male subjects while resting with eyes open and eyes closed in 5 separate sessions conducted over a course of two weeks. Features were extracted using the wavelet packet decomposition and analyzed to obtain the feature vectors. Subsequently, the neural networks algorithm was used to classify the feature vectors. Results show that, whether or not the subjects- eyes were open are insignificant for a 4– channel biometrics system with a classification rate of 81%. However, for a 2–channel system, the P4 channel should not be included if data is acquired with the subjects- eyes open. It was observed that for 2– channel system using only the C3 and C4 channels, a classification rate of 71% was achieved.

Keywords: Biometric, EEG, Wavelet Packet Decomposition, NeuralNetworks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3027
794 Optimal Multilayer Perceptron Structure For Classification of HIV Sub-Type Viruses

Authors: Zeyneb Kurt, Oguzhan Yavuz

Abstract:

The feature of HIV genome is in a wide range because of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point, R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the coreceptors of HIV genome. The aim of this study is to develop the optimal Multilayer Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model (AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4 viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.

Keywords: Multilayer Perceptron, Auto-Regressive Model, HIV, ROC Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440
793 Fake Account Detection in Twitter Based on Minimum Weighted Feature set

Authors: Ahmed El Azab, Amira M. Idrees, Mahmoud A. Mahmoud, Hesham Hefny

Abstract:

Social networking sites such as Twitter and Facebook attracts over 500 million users across the world, for those users, their social life, even their practical life, has become interrelated. Their interaction with social networking has affected their life forever. Accordingly, social networking sites have become among the main channels that are responsible for vast dissemination of different kinds of information during real time events. This popularity in Social networking has led to different problems including the possibility of exposing incorrect information to their users through fake accounts which results to the spread of malicious content during life events. This situation can result to a huge damage in the real world to the society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the fake accounts on Twitter. The study determines the minimized set of the main factors that influence the detection of the fake accounts on Twitter, and then the determined factors are applied using different classification techniques. A comparison of the results of these techniques has been performed and the most accurate algorithm is selected according to the accuracy of the results. The study has been compared with different recent researches in the same area; this comparison has proved the accuracy of the proposed study. We claim that this study can be continuously applied on Twitter social network to automatically detect the fake accounts; moreover, the study can be applied on different social network sites such as Facebook with minor changes according to the nature of the social network which are discussed in this paper.

Keywords: Fake accounts detection, classification algorithms, twitter accounts analysis, features based techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5837
792 A General Model for Amino Acid Interaction Networks

Authors: Omar Gaci, Stefan Balev

Abstract:

In this paper we introduce the notion of protein interaction network. This is a graph whose vertices are the protein-s amino acids and whose edges are the interactions between them. Using a graph theory approach, we identify a number of properties of these networks. We compare them to the general small-world network model and we analyze their hierarchical structure.

Keywords: interaction network, protein structure, small-world network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1578
791 Evaluating some Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
790 Real-Time Testing of Steel Strip Welds based on Bayesian Decision Theory

Authors: Julio Molleda, Daniel F. García, Juan C. Granda, Francisco J. Suárez

Abstract:

One of the main trouble in a steel strip manufacturing line is the breakage of whatever weld carried out between steel coils, that are used to produce the continuous strip to be processed. A weld breakage results in a several hours stop of the manufacturing line. In this process the damages caused by the breakage must be repaired. After the reparation and in order to go on with the production it will be necessary a restarting process of the line. For minimizing this problem, a human operator must inspect visually and manually each weld in order to avoid its breakage during the manufacturing process. The work presented in this paper is based on the Bayesian decision theory and it presents an approach to detect, on real-time, steel strip defective welds. This approach is based on quantifying the tradeoffs between various classification decisions using probability and the costs that accompany such decisions.

Keywords: Classification, Pattern Recognition, ProbabilisticReasoning, Statistical Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410
789 A Study on Finding Similar Document with Multiple Categories

Authors: R. Saraçoğlu, N. Allahverdi

Abstract:

Searching similar documents and document management subjects have important place in text mining. One of the most important parts of similar document research studies is the process of classifying or clustering the documents. In this study, a similar document search approach that includes discussion of out the case of belonging to multiple categories (multiple categories problem) has been carried. The proposed method that based on Fuzzy Similarity Classification (FSC) has been compared with Rocchio algorithm and naive Bayes method which are widely used in text mining. Empirical results show that the proposed method is quite successful and can be applied effectively. For the second stage, multiple categories vector method based on information of categories regarding to frequency of being seen together has been used. Empirical results show that achievement is increased almost two times, when proposed method is compared with classical approach.

Keywords: Document similarity, Fuzzy classification, Multiple categories, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1707
788 Automatic Classification of the Stand-to-Sit Phase in the TUG Test Using Machine Learning

Authors: Y. A. Adla, R. Soubra, M. Kasab, M. O. Diab, A. Chkeir

Abstract:

Over the past several years, researchers have shown a great interest in assessing the mobility of elderly people to measure their functional status. Usually, such an assessment is done by conducting tests that require the subject to walk a certain distance, turn around, and finally sit back down. Consequently, this study aims to provide an at home monitoring system to assess the patient’s status continuously. Thus, we proposed a technique to automatically detect when a subject sits down while walking at home. In this study, we utilized a Doppler radar system to capture the motion of the subjects. More than 20 features were extracted from the radar signals out of which 11 were chosen based on their Intraclass Correlation Coefficient (ICC > 0.75). Accordingly, the sequential floating forward selection wrapper was applied to further narrow down the final feature vector. Finally, five features were introduced to the Linear Discriminant Analysis classifier and an accuracy of 93.75% was achieved as well as a precision and recall of 95% and 90% respectively.

Keywords: Doppler radar system, stand-to-sit phase, TUG test, machine learning, classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 452
787 Differential Protection for Power Transformer Using Wavelet Transform and PNN

Authors: S. Sendilkumar, B. L. Mathur, Joseph Henry

Abstract:

A new approach for protection of power transformer is presented using a time-frequency transform known as Wavelet transform. Different operating conditions such as inrush, Normal, load, External fault and internal fault current are sampled and processed to obtain wavelet coefficients. Different Operating conditions provide variation in wavelet coefficients. Features like energy and Standard deviation are calculated using Parsevals theorem. These features are used as inputs to PNN (Probabilistic neural network) for fault classification. The proposed algorithm provides more accurate results even in the presence of noise inputs and accurately identifies inrush and fault currents. Overall classification accuracy of the proposed method is found to be 96.45%. Simulation of the fault (with and without noise) was done using MATLAB AND SIMULINK software taking 2 cycles of data window (40 m sec) containing 800 samples. The algorithm was evaluated by using 10 % Gaussian white noise.

Keywords: Power Transformer, differential Protection, internalfault, inrush current, Wavelet Energy, Db9.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3131
786 Recursive Similarity Hashing of Fractal Geometry

Authors: Timothee G. Leleu

Abstract:

A new technique of topological multi-scale analysis is introduced. By performing a clustering recursively to build a hierarchy, and analyzing the co-scale and intra-scale similarities, an Iterated Function System can be extracted from any data set. The study of fractals shows that this method is efficient to extract self-similarities, and can find elegant solutions the inverse problem of building fractals. The theoretical aspects and practical implementations are discussed, together with examples of analyses of simple fractals.

Keywords: hierarchical clustering, multi-scale analysis, Similarity hashing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1863
785 Classification of Initial Stripe Height Patterns using Radial Basis Function Neural Network for Proportional Gain Prediction

Authors: Prasit Wonglersak, Prakarnkiat Youngkong, Ittipon Cheowanish

Abstract:

This paper aims to improve a fine lapping process of hard disk drive (HDD) lapping machines by removing materials from each slider together with controlling the strip height (SH) variation to minimum value. The standard deviation is the key parameter to evaluate the strip height variation, hence it is minimized. In this paper, a design of experiment (DOE) with factorial analysis by twoway analysis of variance (ANOVA) is adopted to obtain a statistically information. The statistics results reveal that initial stripe height patterns affect the final SH variation. Therefore, initial SH classification using a radial basis function neural network is implemented to achieve the proportional gain prediction.

Keywords: Stripe height variation, Two-way analysis ofvariance (ANOVA), Radial basis function neural network, Proportional gain prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647
784 Discrimination of Seismic Signals Using Artificial Neural Networks

Authors: Mohammed Benbrahim, Adil Daoudi, Khalid Benjelloun, Aomar Ibenbrahim

Abstract:

The automatic discrimination of seismic signals is an important practical goal for earth-science observatories due to the large amount of information that they receive continuously. An essential discrimination task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, two classes of seismic signals recorded routinely in geophysical laboratory of the National Center for Scientific and Technical Research in Morocco are considered. They correspond to signals associated to local earthquakes and chemical explosions. The approach adopted for the development of an automatic discrimination system is a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "modified Mexican hat wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.

Keywords: Seismic signals, Wavelets, Dimensionality reduction, Artificial neural networks, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1634