Search results for: classification tree.
796 Correlation-based Feature Selection using Ant Colony Optimization
Authors: M. Sadeghzadeh, M. Teshnehlab
Abstract:
Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.
Keywords: Ant colony optimization, Classification, Datamining, Feature selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2420795 Radar Hydrology: New Z/R Relationships for Klang River Basin Malaysia based on Rainfall Classification
Authors: R. Suzana, T. Wardah, A.B. Sahol Hamid
Abstract:
The use of radar in Quantitative Precipitation Estimation (QPE) for radar-rainfall measurement is significantly beneficial. Radar has advantages in terms of high spatial and temporal condition in rainfall measurement and also forecasting. In Malaysia, radar application in QPE is still new and needs to be explored. This paper focuses on the Z/R derivation works of radarrainfall estimation based on rainfall classification. The works developed new Z/R relationships for Klang River Basin in Selangor area for three different general classes of rain events, namely low (<10mm/hr), moderate (>10mm/hr, <30mm/hr) and heavy (>30mm/hr) and also on more specific rain types during monsoon seasons. Looking at the high potential of Doppler radar in QPE, the newly formulated Z/R equations will be useful in improving the measurement of rainfall for any hydrological application, especially for flood forecasting.
Keywords: Radar, Quantitative Precipitation Estimation, Z/R development, flood forecasting
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2151794 Classification of Acoustic Emission Based Partial Discharge in Oil Pressboard Insulation System Using Wavelet Analysis
Authors: Prasanta Kundu, N.K. Kishore, A.K. Sinha
Abstract:
Insulation used in transformer is mostly oil pressboard insulation. Insulation failure is one of the major causes of catastrophic failure of transformers. It is established that partial discharges (PD) cause insulation degradation and premature failure of insulation. Online monitoring of PDs can reduce the risk of catastrophic failure of transformers. There are different techniques of partial discharge measurement like, electrical, optical, acoustic, opto-acoustic and ultra high frequency (UHF). Being non invasive and non interference prone, acoustic emission technique is advantageous for online PD measurement. Acoustic detection of p.d. is based on the retrieval and analysis of mechanical or pressure signals produced by partial discharges. Partial discharges are classified according to the origin of discharges. Their effects on insulation deterioration are different for different types. This paper reports experimental results and analysis for classification of partial discharges using acoustic emission signal of laboratory simulated partial discharges in oil pressboard insulation system using three different electrode systems. Acoustic emission signal produced by PD are detected by sensors mounted on the experimental tank surface, stored on an oscilloscope and fed to computer for further analysis. The measured AE signals are analyzed using discrete wavelet transform analysis and wavelet packet analysis. Energy distribution in different frequency bands of discrete wavelet decomposed signal and wavelet packet decomposed signal is calculated. These analyses show a distinct feature useful for PD classification. Wavelet packet analysis can sort out any misclassification arising out of DWT in most cases.
Keywords: Acoustic emission, discrete wavelet transform, partial discharge, wavelet packet analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2987793 A Survey: Clustering Ensembles Techniques
Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha
Abstract:
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3449792 Classification of Earthquake Distribution in the Banda Sea Collision Zone with Point Process Approach
Authors: Henry J. Wattimanela, Udjianna S. Pasaribu, Nanang T. Puspito, Sapto W. Indratno
Abstract:
Banda Sea Collision Zone (BSCZ) is the result of the interaction and convergence of Indo-Australian plate, Eurasian plate and Pacific plate. This location is located in eastern Indonesia. This zone has a very high seismic activity. In this research, we will calculate the rate (λ) and Mean Square Error (MSE). By this result, we will classification earthquakes distribution in the BSCZ with the point process approach. Chi-square is used to determine the type of earthquakes distribution in the sub region of BSCZ. The data used in this research is data of earthquakes with a magnitude ≥ 6 SR for the period 1964-2013 and sourced from BMKG Jakarta. This research is expected to contribute to the Moluccas Province and surrounding local governments in performing spatial plan document related to disaster management.Keywords: Banda sea collision zone, earthquakes, mean square error, Poisson distribution, chi-square test.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117791 Adapting Tools for Text Monitoring and for Scenario Analysis Related to the Field of Social Disasters
Authors: Svetlana Cojocaru, Mircea Petic, Inga Titchiev
Abstract:
Humanity faces more and more often with different social disasters, which in turn can generate new accidents and catastrophes. To mitigate their consequences, it is important to obtain early possible signals about the events which are or can occur and to prepare the corresponding scenarios that could be applied. Our research is focused on solving two problems in this domain: identifying signals related that an accident occurred or may occur and mitigation of some consequences of disasters. To solve the first problem, methods of selecting and processing texts from global network Internet are developed. Information in Romanian is of special interest for us. In order to obtain the mentioned tools, we should follow several steps, divided into preparatory stage and processing stage. Throughout the first stage, we manually collected over 724 news articles and classified them into 10 categories of social disasters. It constitutes more than 150 thousand words. Using this information, a controlled vocabulary of more than 300 keywords was elaborated, that will help in the process of classification and identification of the texts related to the field of social disasters. To solve the second problem, the formalism of Petri net has been used. We deal with the problem of inhabitants’ evacuation in useful time. The analysis methods such as reachability or coverability tree and invariants technique to determine dynamic properties of the modeled systems will be used. To perform a case study of properties of extended evacuation system by adding time, the analysis modules of PIPE such as Generalized Stochastic Petri Nets (GSPN) Analysis, Simulation, State Space Analysis, and Invariant Analysis have been used. These modules helped us to obtain the average number of persons situated in the rooms and the other quantitative properties and characteristics related to its dynamics.Keywords: Lexicon of disasters, modelling, Petri nets, text annotation, social disasters.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1157790 Knowledge Discovery and Data Mining Techniques in Textile Industry
Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler
Abstract:
This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.Keywords: Data mining, textile production, decision trees, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538789 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling
Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal
Abstract:
Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837788 Fault Classification of Double Circuit Transmission Line Using Artificial Neural Network
Authors: Anamika Jain, A. S. Thoke, R. N. Patel
Abstract:
This paper addresses the problems encountered by conventional distance relays when protecting double-circuit transmission lines. The problems arise principally as a result of the mutual coupling between the two circuits under different fault conditions; this mutual coupling is highly nonlinear in nature. An adaptive protection scheme is proposed for such lines based on application of artificial neural network (ANN). ANN has the ability to classify the nonlinear relationship between measured signals by identifying different patterns of the associated signals. One of the key points of the present work is that only current signals measured at local end have been used to detect and classify the faults in the double circuit transmission line with double end infeed. The adaptive protection scheme is tested under a specific fault type, but varying fault location, fault resistance, fault inception angle and with remote end infeed. An improved performance is experienced once the neural network is trained adequately, which performs precisely when faced with different system parameters and conditions. The entire test results clearly show that the fault is detected and classified within a quarter cycle; thus the proposed adaptive protection technique is well suited for double circuit transmission line fault detection & classification. Results of performance studies show that the proposed neural network-based module can improve the performance of conventional fault selection algorithms.
Keywords: Double circuit transmission line, Fault detection and classification, High impedance fault and Artificial Neural Network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3186787 DIFFER: A Propositionalization approach for Learning from Structured Data
Authors: Thashmee Karunaratne, Henrik Böstrom
Abstract:
Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.Keywords: Machine learning, Structure classification, Propositionalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1222786 On the Solution of the Towers of Hanoi Problem
Authors: Hayedeh Ahrabian, Comfar Badamchi, Abbass Nowzari-Dalini
Abstract:
In this paper, two versions of an iterative loopless algorithm for the classical towers of Hanoi problem with O(1) storage complexity and O(2n) time complexity are presented. Based on this algorithm the number of different moves in each of pegs with its direction is formulated.Keywords: Loopless algorithm, Binary tree, Towers of Hanoi.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4835785 Classification System for a Collaborative Urban Retail Logistics
Authors: Volker Lange, Stephanie Moede, Christiane Auffermann
Abstract:
From an economic standpoint the current and future road traffic situation in urban areas is a cost factor. Traffic jams and congestion prolong journey times and tie up resources in trucks and personnel. Many discussions about imposing charges or tolls for cities in Europe in order to reduce traffic congestion are currently in progress. Both of these effects lead – directly or indirectly - to additional costs for the urban distribution systems in retail companies. One approach towards improving the efficiency of retail distribution systems, and thus towards avoiding negative environmental factors in urban areas, is horizontal collaboration for deliveries to retail outlets – Urban Retail Logistics. This paper presents a classification system to help reveal where cooperation between retail companies is possible and makes sense for deliveries to retail outlets in urban areas.
Keywords: City Logistics, Horizontal Collaboration, Urban Freight Transport, Urban Retail Logistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2358784 The Imaging Methods for Classifying Crispiness of Freeze-Dried Durian using Fuzzy Logic
Authors: Sitthichon Kanitthakun, Pinit Kumhom, Kosin Chamnongthai
Abstract:
In quality control of freeze-dried durian, crispiness is a key quality index of the product. Generally, crispy testing has to be done by a destructive method. A nondestructive testing of the crispiness is required because the samples can be reused for other kinds of testing. This paper proposed a crispiness classification method of freeze-dried durians using fuzzy logic for decision making. The physical changes of a freeze-dried durian include the pores appearing in the images. Three physical features including (1) the diameters of pores, (2) the ratio of the pore area and the remaining area, and (3) the distribution of the pores are considered to contribute to the crispiness. The fuzzy logic is applied for making the decision. The experimental results comparing with food expert opinion showed that the accuracy of the proposed classification method is 83.33 percent.Keywords: Durian, crispiness, freeze drying, pore, fuzzy logic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1970783 Genetic Algorithms and Kernel Matrix-based Criteria Combined Approach to Perform Feature and Model Selection for Support Vector Machines
Authors: A. Perolini
Abstract:
Feature and model selection are in the center of attention of many researches because of their impact on classifiers- performance. Both selections are usually performed separately but recent developments suggest using a combined GA-SVM approach to perform them simultaneously. This approach improves the performance of the classifier identifying the best subset of variables and the optimal parameters- values. Although GA-SVM is an effective method it is computationally expensive, thus a rough method can be considered. The paper investigates a joined approach of Genetic Algorithm and kernel matrix criteria to perform simultaneously feature and model selection for SVM classification problem. The purpose of this research is to improve the classification performance of SVM through an efficient approach, the Kernel Matrix Genetic Algorithm method (KMGA).Keywords: Feature and model selection, Genetic Algorithms, Support Vector Machines, kernel matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1597782 One-Class Support Vector Machines for Protein-Protein Interactions Prediction
Authors: Hany Alashwal, Safaai Deris, Razib M. Othman
Abstract:
Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1989781 Determining the Gender of Korean Names for Pronoun Generation
Authors: Seong-Bae Park, Hee-Geun Yoon
Abstract:
It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.Keywords: machine translation, natural language processing, gender of proper nouns, statistical method
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2368780 Pattern Recognition as an Internalized Motor Programme
Authors: M. Jändel
Abstract:
A new conceptual architecture for low-level neural pattern recognition is presented. The key ideas are that the brain implements support vector machines and that support vectors are represented as memory patterns in competitive queuing memories. A binary classifier is built from two competitive queuing memories holding positive and negative valence training examples respectively. The support vector machine classification function is calculated in synchronized evaluation cycles. The kernel is computed by bisymmetric feed-forward networks feed by sensory input and by competitive queuing memories traversing the complete sequence of support vectors. Temporary summation generates the output classification. It is speculated that perception apparatus in the brain reuses structures that have evolved for enabling fluent execution of prepared action sequences so that pattern recognition is built on internalized motor programmes.Keywords: Competitive queuing model, Olfactory system, Pattern recognition, Support vector machine, Thalamus
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1369779 Development of Better Quality Low-Cost Activated Carbon from South African Pine Tree (Pinus patula) Sawdust: Characterization and Comparative Phenol Adsorption
Authors: L. Mukosha, M. S. Onyango, A. Ochieng, H. Kasaini
Abstract:
The remediation of water resources pollution in developing countries requires the application of alternative sustainable cheaper and efficient end-of-pipe wastewater treatment technologies. The feasibility of use of South African cheap and abundant pine tree (Pinus patula) sawdust for development of lowcost AC of comparable quality to expensive commercial ACs in the abatement of water pollution was investigated. AC was developed at optimized two-stage N2-superheated steam activation conditions in a fixed bed reactor, and characterized for proximate and ultimate properties, N2-BET surface area, pore size distribution, SEM, pHPZC and FTIR. The sawdust pyrolysis activation energy was evaluated by TGA. Results indicated that the chars prepared at 800oC and 2hrs were suitable for development of better quality AC at 800oC and 47% burn-off having BET surface area (1086m2/g), micropore volume (0.26cm3/g), and mesopore volume (0.43cm3/g) comparable to expensive commercial ACs, and suitable for water contaminants removal. The developed AC showed basic surface functionality at pHPZC at 10.3, and a phenol adsorption capacity that was higher than that of commercial Norit (RO 0.8) AC. Thus, it is feasible to develop better quality low-cost AC from (Pinus patula) sawdust using twostage N2-steam activation in fixed-bed reactor.
Keywords: Activated carbon, phenol adsorption, sawdust integrated utilization, economical wastewater treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3470778 Integrated ACOR/IACOMV-R-SVM Algorithm
Authors: Hiba Basim Alwan, Ku Ruhana Ku-Mahamud
Abstract:
A direction for ACO is to optimize continuous and mixed (discrete and continuous) variables in solving problems with various types of data. Support Vector Machine (SVM), which originates from the statistical approach, is a present day classification technique. The main problems of SVM are selecting feature subset and tuning the parameters. Discretizing the continuous value of the parameters is the most common approach in tuning SVM parameters. This process will result in loss of information which affects the classification accuracy. This paper presents two algorithms that can simultaneously tune SVM parameters and select the feature subset. The first algorithm, ACOR-SVM, will tune SVM parameters, while the second IACOMV-R-SVM algorithm will simultaneously tune SVM parameters and select the feature subset. Three benchmark UCI datasets were used in the experiments to validate the performance of the proposed algorithms. The results show that the proposed algorithms have good performances as compared to other approaches.Keywords: Continuous ant colony optimization, incremental continuous ant colony, simultaneous optimization, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 880777 Level of Acceptability of Moringa oleifera Diversified Products among Rural and Urban Dwellers in Nigeria
Authors: Mojisola F. Oyewole, Franscisca T. Adetoro, Nkiru T. Meludu
Abstract:
Moringa oleifera is a nutritious vegetable tree with varieties of potential uses, as almost every part of the Moringa oleifera tree can be used for food. This study was conducted in Oyo State, Nigeria, to find out the level of acceptability of Moringa oleifera diversified products among rural and urban dwellers. Purposive sampling was used to select two local governments’ areas. Stratified sampling technique was also used to select one community each from rural and urban areas while snowball sampling technique was used to select ten respondents each from the two communities, making a total number of forty respondents. Data were analyzed using frequencies, percentages, Chi-square, Pearson Product Moment Correlation and regression analysis. Result from the study revealed that majority of the respondents (80%) fell within the age range of 20-49 years and 55% of them were male, 55% were married, 70% of them were Christians, 80% of them had tertiary education. The result also showed that 85% were aware of the Moringa plant and (65%) of them have consumed Moringa oleifera and the perception statements on the benefits of Moringa oleifera indicated that (52.5%) of the respondents rated Moringa oleifera to be favorable, most of them had high acceptability for Moringa egusi soup, Moringa tea, Moringa pap and yam pottage with Moringa. The result of the hypotheses testing showed that there is a significant relationship between sex of the respondents and acceptability of the diversified Moringa oleifera products (x2=6.465, p = 0.011). There is also a significant relationship between family size of the respondents level of acceptability of the Moringa oleifera products (r = 0.327, p = 0.040). Based on the level of acceptability of Moringa oleifera diversified products; the plant is of great economic importance to the populace. Therefore, there should be more public awareness through the media to enlighten people on the beneficial effects of Moringa oleifera.Keywords: Acceptability, Moringa oleifera, Diversified, Product, Dwellers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2612776 Analysis of the EEG Signal for a Practical Biometric System
Authors: Muhammad Kamil Abdullah, Khazaimatol S Subari, Justin Leo Cheang Loong, Nurul Nadia Ahmad
Abstract:
This paper discusses the effectiveness of the EEG signal for human identification using four or less of channels of two different types of EEG recordings. Studies have shown that the EEG signal has biometric potential because signal varies from person to person and impossible to replicate and steal. Data were collected from 10 male subjects while resting with eyes open and eyes closed in 5 separate sessions conducted over a course of two weeks. Features were extracted using the wavelet packet decomposition and analyzed to obtain the feature vectors. Subsequently, the neural networks algorithm was used to classify the feature vectors. Results show that, whether or not the subjects- eyes were open are insignificant for a 4– channel biometrics system with a classification rate of 81%. However, for a 2–channel system, the P4 channel should not be included if data is acquired with the subjects- eyes open. It was observed that for 2– channel system using only the C3 and C4 channels, a classification rate of 71% was achieved.Keywords: Biometric, EEG, Wavelet Packet Decomposition, NeuralNetworks
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3027775 Optimal Multilayer Perceptron Structure For Classification of HIV Sub-Type Viruses
Authors: Zeyneb Kurt, Oguzhan Yavuz
Abstract:
The feature of HIV genome is in a wide range because of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point, R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the coreceptors of HIV genome. The aim of this study is to develop the optimal Multilayer Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model (AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4 viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.Keywords: Multilayer Perceptron, Auto-Regressive Model, HIV, ROC Analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440774 Fake Account Detection in Twitter Based on Minimum Weighted Feature set
Authors: Ahmed El Azab, Amira M. Idrees, Mahmoud A. Mahmoud, Hesham Hefny
Abstract:
Social networking sites such as Twitter and Facebook attracts over 500 million users across the world, for those users, their social life, even their practical life, has become interrelated. Their interaction with social networking has affected their life forever. Accordingly, social networking sites have become among the main channels that are responsible for vast dissemination of different kinds of information during real time events. This popularity in Social networking has led to different problems including the possibility of exposing incorrect information to their users through fake accounts which results to the spread of malicious content during life events. This situation can result to a huge damage in the real world to the society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the fake accounts on Twitter. The study determines the minimized set of the main factors that influence the detection of the fake accounts on Twitter, and then the determined factors are applied using different classification techniques. A comparison of the results of these techniques has been performed and the most accurate algorithm is selected according to the accuracy of the results. The study has been compared with different recent researches in the same area; this comparison has proved the accuracy of the proposed study. We claim that this study can be continuously applied on Twitter social network to automatically detect the fake accounts; moreover, the study can be applied on different social network sites such as Facebook with minor changes according to the nature of the social network which are discussed in this paper.Keywords: Fake accounts detection, classification algorithms, twitter accounts analysis, features based techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5837773 Evaluating some Feature Selection Methods for an Improved SVM Classifier
Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp
Abstract:
Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).
Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538772 Real-Time Testing of Steel Strip Welds based on Bayesian Decision Theory
Authors: Julio Molleda, Daniel F. García, Juan C. Granda, Francisco J. Suárez
Abstract:
One of the main trouble in a steel strip manufacturing line is the breakage of whatever weld carried out between steel coils, that are used to produce the continuous strip to be processed. A weld breakage results in a several hours stop of the manufacturing line. In this process the damages caused by the breakage must be repaired. After the reparation and in order to go on with the production it will be necessary a restarting process of the line. For minimizing this problem, a human operator must inspect visually and manually each weld in order to avoid its breakage during the manufacturing process. The work presented in this paper is based on the Bayesian decision theory and it presents an approach to detect, on real-time, steel strip defective welds. This approach is based on quantifying the tradeoffs between various classification decisions using probability and the costs that accompany such decisions.Keywords: Classification, Pattern Recognition, ProbabilisticReasoning, Statistical Data Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410771 A Study on Finding Similar Document with Multiple Categories
Authors: R. Saraçoğlu, N. Allahverdi
Abstract:
Searching similar documents and document management subjects have important place in text mining. One of the most important parts of similar document research studies is the process of classifying or clustering the documents. In this study, a similar document search approach that includes discussion of out the case of belonging to multiple categories (multiple categories problem) has been carried. The proposed method that based on Fuzzy Similarity Classification (FSC) has been compared with Rocchio algorithm and naive Bayes method which are widely used in text mining. Empirical results show that the proposed method is quite successful and can be applied effectively. For the second stage, multiple categories vector method based on information of categories regarding to frequency of being seen together has been used. Empirical results show that achievement is increased almost two times, when proposed method is compared with classical approach.
Keywords: Document similarity, Fuzzy classification, Multiple categories, Text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1707770 Automatic Classification of the Stand-to-Sit Phase in the TUG Test Using Machine Learning
Authors: Y. A. Adla, R. Soubra, M. Kasab, M. O. Diab, A. Chkeir
Abstract:
Over the past several years, researchers have shown a great interest in assessing the mobility of elderly people to measure their functional status. Usually, such an assessment is done by conducting tests that require the subject to walk a certain distance, turn around, and finally sit back down. Consequently, this study aims to provide an at home monitoring system to assess the patient’s status continuously. Thus, we proposed a technique to automatically detect when a subject sits down while walking at home. In this study, we utilized a Doppler radar system to capture the motion of the subjects. More than 20 features were extracted from the radar signals out of which 11 were chosen based on their Intraclass Correlation Coefficient (ICC > 0.75). Accordingly, the sequential floating forward selection wrapper was applied to further narrow down the final feature vector. Finally, five features were introduced to the Linear Discriminant Analysis classifier and an accuracy of 93.75% was achieved as well as a precision and recall of 95% and 90% respectively.
Keywords: Doppler radar system, stand-to-sit phase, TUG test, machine learning, classification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 452769 Differential Protection for Power Transformer Using Wavelet Transform and PNN
Authors: S. Sendilkumar, B. L. Mathur, Joseph Henry
Abstract:
A new approach for protection of power transformer is presented using a time-frequency transform known as Wavelet transform. Different operating conditions such as inrush, Normal, load, External fault and internal fault current are sampled and processed to obtain wavelet coefficients. Different Operating conditions provide variation in wavelet coefficients. Features like energy and Standard deviation are calculated using Parsevals theorem. These features are used as inputs to PNN (Probabilistic neural network) for fault classification. The proposed algorithm provides more accurate results even in the presence of noise inputs and accurately identifies inrush and fault currents. Overall classification accuracy of the proposed method is found to be 96.45%. Simulation of the fault (with and without noise) was done using MATLAB AND SIMULINK software taking 2 cycles of data window (40 m sec) containing 800 samples. The algorithm was evaluated by using 10 % Gaussian white noise.Keywords: Power Transformer, differential Protection, internalfault, inrush current, Wavelet Energy, Db9.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3131768 Classification of Initial Stripe Height Patterns using Radial Basis Function Neural Network for Proportional Gain Prediction
Authors: Prasit Wonglersak, Prakarnkiat Youngkong, Ittipon Cheowanish
Abstract:
This paper aims to improve a fine lapping process of hard disk drive (HDD) lapping machines by removing materials from each slider together with controlling the strip height (SH) variation to minimum value. The standard deviation is the key parameter to evaluate the strip height variation, hence it is minimized. In this paper, a design of experiment (DOE) with factorial analysis by twoway analysis of variance (ANOVA) is adopted to obtain a statistically information. The statistics results reveal that initial stripe height patterns affect the final SH variation. Therefore, initial SH classification using a radial basis function neural network is implemented to achieve the proportional gain prediction.Keywords: Stripe height variation, Two-way analysis ofvariance (ANOVA), Radial basis function neural network, Proportional gain prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647767 Discrimination of Seismic Signals Using Artificial Neural Networks
Authors: Mohammed Benbrahim, Adil Daoudi, Khalid Benjelloun, Aomar Ibenbrahim
Abstract:
The automatic discrimination of seismic signals is an important practical goal for earth-science observatories due to the large amount of information that they receive continuously. An essential discrimination task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, two classes of seismic signals recorded routinely in geophysical laboratory of the National Center for Scientific and Technical Research in Morocco are considered. They correspond to signals associated to local earthquakes and chemical explosions. The approach adopted for the development of an automatic discrimination system is a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "modified Mexican hat wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.Keywords: Seismic signals, Wavelets, Dimensionality reduction, Artificial neural networks, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1634