Search results for: Clusterization and classification algorithms
2394 Correlation-based Feature Selection using Ant Colony Optimization
Authors: M. Sadeghzadeh, M. Teshnehlab
Abstract:
Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.
Keywords: Ant colony optimization, Classification, Datamining, Feature selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24202393 Anomaly Detection and Characterization to Classify Traffic Anomalies Case Study: TOT Public Company Limited Network
Authors: O. Siriporn, S. Benjawan
Abstract:
This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.
Keywords: Unsupervised, clustering, anomaly, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21132392 Evaluation of Classification Algorithms for Road Environment Detection
Authors: T. Anbu, K. Aravind Kumar
Abstract:
The road environment information is needed accurately for applications such as road maintenance and virtual 3D city modeling. Mobile laser scanning (MLS) produces dense point clouds from huge areas efficiently from which the road and its environment can be modeled in detail. Objects such as buildings, cars and trees are an important part of road environments. Different methods have been developed for detection of above such objects, but still there is a lack of accuracy due to the problems of illumination, environmental changes, and multiple objects with same features. In this work the comparison between different classifiers such as Multiclass SVM, kNN and Multiclass LDA for the road environment detection is analyzed. Finally the classification accuracy for kNN with LBP feature improved the classification accuracy as 93.3% than the other classifiers.Keywords: Classifiers, feature extraction, mobile-based laser scanning, object location estimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7742391 Development of Fake News Model Using Machine Learning through Natural Language Processing
Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini
Abstract:
Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.
Keywords: Fake news detection, types of fake news, machine learning, natural language processing, classification techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15122390 Genetic Algorithms and Kernel Matrix-based Criteria Combined Approach to Perform Feature and Model Selection for Support Vector Machines
Authors: A. Perolini
Abstract:
Feature and model selection are in the center of attention of many researches because of their impact on classifiers- performance. Both selections are usually performed separately but recent developments suggest using a combined GA-SVM approach to perform them simultaneously. This approach improves the performance of the classifier identifying the best subset of variables and the optimal parameters- values. Although GA-SVM is an effective method it is computationally expensive, thus a rough method can be considered. The paper investigates a joined approach of Genetic Algorithm and kernel matrix criteria to perform simultaneously feature and model selection for SVM classification problem. The purpose of this research is to improve the classification performance of SVM through an efficient approach, the Kernel Matrix Genetic Algorithm method (KMGA).Keywords: Feature and model selection, Genetic Algorithms, Support Vector Machines, kernel matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15972389 Analysis of Feature Space for a 2d/3d Vision based Emotion Recognition Method
Authors: Robert Niese, Ayoub Al-Hamadi, Bernd Michaelis
Abstract:
In modern human computer interaction systems (HCI), emotion recognition is becoming an imperative characteristic. The quest for effective and reliable emotion recognition in HCI has resulted in a need for better face detection, feature extraction and classification. In this paper we present results of feature space analysis after briefly explaining our fully automatic vision based emotion recognition method. We demonstrate the compactness of the feature space and show how the 2d/3d based method achieves superior features for the purpose of emotion classification. Also it is exposed that through feature normalization a widely person independent feature space is created. As a consequence, the classifier architecture has only a minor influence on the classification result. This is particularly elucidated with the help of confusion matrices. For this purpose advanced classification algorithms, such as Support Vector Machines and Artificial Neural Networks are employed, as well as the simple k- Nearest Neighbor classifier.Keywords: Facial expression analysis, Feature extraction, Image processing, Pattern Recognition, Application.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19232388 Integration of Support Vector Machine and Bayesian Neural Network for Data Mining and Classification
Authors: Essam Al-Daoud
Abstract:
Several combinations of the preprocessing algorithms, feature selection techniques and classifiers can be applied to the data classification tasks. This study introduces a new accurate classifier, the proposed classifier consist from four components: Signal-to- Noise as a feature selection technique, support vector machine, Bayesian neural network and AdaBoost as an ensemble algorithm. To verify the effectiveness of the proposed classifier, seven well known classifiers are applied to four datasets. The experiments show that using the suggested classifier enhances the classification rates for all datasets.Keywords: AdaBoost, Bayesian neural network, Signal-to-Noise, support vector machine, MCMC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20202387 An Exact Solution to Support Vector Mixture
Authors: Monjed Ezzeddinne, Nicolas Lefebvre, Régis Lengellé
Abstract:
This paper presents a new version of the SVM mixture algorithm initially proposed by Kwok for classification and regression problems. For both cases, a slight modification of the mixture model leads to a standard SVM training problem, to the existence of an exact solution and allows the direct use of well known decomposition and working set selection algorithms. Only the regression case is considered in this paper but classification has been addressed in a very similar way. This method has been successfully applied to engine pollutants emission modeling.Keywords: Identification, Learning systems, Mixture ofExperts, Support Vector Machines.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13652386 Congestion Control for Internet Media Traffic
Authors: Mohammad A. Talaat, Magdi A. Koutb, Hoda S. Sorour
Abstract:
In this paper we investigated a number of the Internet congestion control algorithms that has been developed in the last few years. It was obviously found that many of these algorithms were designed to deal with the Internet traffic merely as a train of consequent packets. Other few algorithms were specifically tailored to handle the Internet congestion caused by running media traffic that represents audiovisual content. This later set of algorithms is considered to be aware of the nature of this media content. In this context we briefly explained a number of congestion control algorithms and hence categorized them into the two following categories: i) Media congestion control algorithms. ii) Common congestion control algorithms. We hereby recommend the usage of the media congestion control algorithms for the reason of being media content-aware rather than the other common type of algorithms that blindly manipulates such traffic. We showed that the spread of such media content-aware algorithms over Internet will lead to better congestion control status in the coming years. This is due to the observed emergence of the era of digital convergence where the media traffic type will form the majority of the Internet traffic.Keywords: Congestion Control, Media Traffic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15672385 GA Based Optimal Feature Extraction Method for Functional Data Classification
Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai
Abstract:
Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper, a novel automatic method which combined Genetic Algorithm (GA) and classification algorithm to extract classification features is proposed. In this method, the optimal features and classification model are approached via evolutional study step by step. It is proved by theory analysis and experiment test that this method has advantages in improving classification efficiency, precision and robustness whereas using less features and the dimension of extracted classification features can be controlled.Keywords: Classification, functional data, feature extraction, genetic algorithm, wavelet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15552384 Comparative Study Using Weka for Red Blood Cells Classification
Authors: Jameela Ali Alkrimi, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy
Abstract:
Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithms tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital - Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.
Keywords: K-Nearest Neighbors, Neural Network, Radial Basis Function, Red blood cells, Support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29952383 Meta-Classification using SVM Classifiers for Text Documents
Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp
Abstract:
Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated three approaches to build a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a metaclassifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the accuracy of classification and that our meta-classification strategy gives better results than each individual classifier. For 7083 Reuters text documents we obtained a classification accuracies up to 92.04%.Keywords: Meta-classification, Learning with Kernels, Support Vector Machine, and Performance Evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16162382 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20552381 The Research of Fuzzy Classification Rules Applied to CRM
Authors: Chien-Hua Wang, Meng-Ying Chou, Chin-Tzong Pang
Abstract:
In the era of great competition, understanding and satisfying customers- requirements are the critical tasks for a company to make a profits. Customer relationship management (CRM) thus becomes an important business issue at present. With the help of the data mining techniques, the manager can explore and analyze from a large quantity of data to discover meaningful patterns and rules. Among all methods, well-known association rule is most commonly seen. This paper is based on Apriori algorithm and uses genetic algorithms combining a data mining method to discover fuzzy classification rules. The mined results can be applied in CRM to help decision marker make correct business decisions for marketing strategies.Keywords: Customer relationship management (CRM), Data mining, Apriori algorithm, Genetic algorithm, Fuzzy classification rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16612380 Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks
Authors: Abdesselem Dakhli, Wajdi Bellil, Chokri Ben Amar
Abstract:
DNA Barcode provides good sources of needed information to classify living species. The classification problem has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use the similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. However, all the used methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. In fact, our method permits to avoid the complex problem of form and structure in different classes of organisms. The empirical data and their classification performances are compared with other methods. Evenly, in this study, we present our system which is consisted of three phases. The first one, is called transformation, is composed of three sub steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. Moreover, the second phase step is an approximation; it is empowered by the use of Multi Library Wavelet Neural Networks (MLWNN). Finally, the third one, is called the classification of DNA Barcodes, is realized by applying the algorithm of hierarchical classification.Keywords: DNA Barcode, Electron-Ion Interaction Pseudopotential, Multi Library Wavelet Neural Networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19672379 Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
Authors: S. Qaedi, S. Seyedtabaii
Abstract:
Non-Destructive evaluation of in-service power transformer condition is necessary for avoiding catastrophic failures. Dissolved Gas Analysis (DGA) is one of the important methods. Traditional, statistical and intelligent DGA approaches have been adopted for accurate classification of incipient fault sources. Unfortunately, there are not often enough faulty patterns required for sufficient training of intelligent systems. By bootstrapping the shortcoming is expected to be alleviated and algorithms with better classification success rates to be obtained. In this paper the performance of an artificial neural network, K-Nearest Neighbour and support vector machine methods using bootstrapped data are detailed and shown that while the success rate of the ANN algorithms improves remarkably, the outcome of the others do not benefit so much from the provided enlarged data space. For assessment, two databases are employed: IEC TC10 and a dataset collected from reported data in papers. High average test success rate well exhibits the remarkable outcome.Keywords: Dissolved gas analysis, Transformer incipient fault, Artificial Neural Network, Support Vector Machine (SVM), KNearest Neighbor (KNN)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27392378 Evaluation of Algorithms for Sequential Decision in Biosonar Target Classification
Authors: Turgay Temel, John Hallam
Abstract:
A sequential decision problem, based on the task ofidentifying the species of trees given acoustic echo data collectedfrom them, is considered with well-known stochastic classifiers,including single and mixture Gaussian models. Echoes are processedwith a preprocessing stage based on a model of mammalian cochlearfiltering, using a new discrete low-pass filter characteristic. Stoppingtime performance of the sequential decision process is evaluated andcompared. It is observed that the new low pass filter processingresults in faster sequential decisions.
Keywords: Classification, neuro-spike coding, parametricmodel, Gaussian mixture with EM algorithm, sequential decision.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15472377 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh
Abstract:
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12722376 An Evaluation of Algorithms for Single-Echo Biosonar Target Classification
Authors: Turgay Temel, John Hallam
Abstract:
A recent neurospiking coding scheme for feature extraction from biosonar echoes of various plants is examined with avariety of stochastic classifiers. Feature vectors derived are employedin well-known stochastic classifiers, including nearest-neighborhood,single Gaussian and a Gaussian mixture with EM optimization.Classifiers' performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifers perform equivalently and that the modified preprocessing configuration yields considerably improved results.
Keywords: Classification, neuro-spike coding, non-parametricmodel, parametric model, Gaussian mixture, EM algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16692375 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations
Authors: Lai Lai Win Kyi, Nay Min Tun
Abstract:
Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.
Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14822374 A Review on Soft Computing Technique in Intrusion Detection System
Authors: Noor Suhana Sulaiman, Rohani Abu Bakar, Norrozila Sulaiman
Abstract:
Intrusion Detection System is significant in network security. It detects and identifies intrusion behavior or intrusion attempts in a computer system by monitoring and analyzing the network packets in real time. In the recent year, intelligent algorithms applied in the intrusion detection system (IDS) have been an increasing concern with the rapid growth of the network security. IDS data deals with a huge amount of data which contains irrelevant and redundant features causing slow training and testing process, higher resource consumption as well as poor detection rate. Since the amount of audit data that an IDS needs to examine is very large even for a small network, classification by hand is impossible. Hence, the primary objective of this review is to review the techniques prior to classification process suit to IDS data.Keywords: Intrusion Detection System, security, soft computing, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18642373 Ensembling Classifiers – An Application toImage Data Classification from Cherenkov Telescope Experiment
Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti
Abstract:
Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques with classifiers such as random forests, neural networks and support vector machines. The data sets are from MAGIC, a Cherenkov telescope experiment. The task is to classify gamma signals from overwhelmingly hadron and muon signals representing a rare class classification problem. We compare the individual classifiers with their ensemble counterparts and discuss the results. WEKA a wonderful tool for machine learning has been used for making the experiments.Keywords: Ensembles, WEKA, Neural networks [NN], SupportVector Machines [SVM], Random Forests [RF].
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17652372 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children
Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco
Abstract:
Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.Keywords: Feature selection, multi-objective evolutionary computation, unsupervised classification, behavior assessment system for children.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14472371 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.Keywords: Clustering, method, algorithm, hierarchical, survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33762370 Rapid Study on Feature Extraction and Classification Models in Healthcare Applications
Authors: S. Sowmyayani
Abstract:
The advancement of computer-aided design helps the medical force and security force. Some applications include biometric recognition, elderly fall detection, face recognition, cancer recognition, tumor recognition, etc. This paper deals with different machine learning algorithms that are more generically used for any health care system. The most focused problems are classification and regression. With the rise of big data, machine learning has become particularly important for solving problems. Machine learning uses two types of techniques: supervised learning and unsupervised learning. The former trains a model on known input and output data and predicts future outputs. Classification and regression are supervised learning techniques. Unsupervised learning finds hidden patterns in input data. Clustering is one such unsupervised learning technique. The above-mentioned models are discussed briefly in this paper.
Keywords: Supervised learning, unsupervised learning, regression, neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3462369 Rank-Based Chain-Mode Ensemble for Binary Classification
Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu
Abstract:
In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.
Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7342368 Visualization of Searching and Sorting Algorithms
Authors: Bremananth R, Radhika.V, Thenmozhi.S
Abstract:
Sequences of execution of algorithms in an interactive manner using multimedia tools are employed in this paper. It helps to realize the concept of fundamentals of algorithms such as searching and sorting method in a simple manner. Visualization gains more attention than theoretical study and it is an easy way of learning process. We propose methods for finding runtime sequence of each algorithm in an interactive way and aims to overcome the drawbacks of the existing character systems. System illustrates each and every step clearly using text and animation. Comparisons of its time complexity have been carried out and results show that our approach provides better perceptive of algorithms.Keywords: Algorithms, Searching, Sorting, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21142367 Investigation on Novel Based Metaheuristic Algorithms for Combinatorial Optimization Problems in Ad Hoc Networks
Authors: C. Rajan, N. Shanthi, C. Rasi Priya, K. Geetha
Abstract:
Routing in MANET is extremely challenging because of MANETs dynamic features, its limited bandwidth, frequent topology changes caused by node mobility and power energy consumption. In order to efficiently transmit data to destinations, the applicable routing algorithms must be implemented in mobile ad-hoc networks. Thus we can increase the efficiency of the routing by satisfying the Quality of Service (QoS) parameters by developing routing algorithms for MANETs. The algorithms that are inspired by the principles of natural biological evolution and distributed collective behavior of social colonies have shown excellence in dealing with complex optimization problems and are becoming more popular. This paper presents a survey on few meta-heuristic algorithms and naturally-inspired algorithms.
Keywords: Ant colony optimization, genetic algorithm, Naturally-inspired algorithms and particle swarm optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27022366 Design and Implementation of a Counting and Differentiation System for Vehicles through Video Processing
Authors: Derlis Gregor, Kevin Cikel, Mario Arzamendia, Raúl Gregor
Abstract:
This paper presents a self-sustaining mobile system for counting and classification of vehicles through processing video. It proposes a counting and classification algorithm divided in four steps that can be executed multiple times in parallel in a SBC (Single Board Computer), like the Raspberry Pi 2, in such a way that it can be implemented in real time. The first step of the proposed algorithm limits the zone of the image that it will be processed. The second step performs the detection of the mobile objects using a BGS (Background Subtraction) algorithm based on the GMM (Gaussian Mixture Model), as well as a shadow removal algorithm using physical-based features, followed by morphological operations. In the first step the vehicle detection will be performed by using edge detection algorithms and the vehicle following through Kalman filters. The last step of the proposed algorithm registers the vehicle passing and performs their classification according to their areas. An auto-sustainable system is proposed, powered by batteries and photovoltaic solar panels, and the data transmission is done through GPRS (General Packet Radio Service)eliminating the need of using external cable, which will facilitate it deployment and translation to any location where it could operate. The self-sustaining trailer will allow the counting and classification of vehicles in specific zones with difficult access.Keywords: Intelligent transportation systems, object detection, video processing, road traffic, vehicle counting, vehicle classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16242365 Pose Normalization Network for Object Classification
Authors: Bingquan Shen
Abstract:
Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classification, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classification to yield better classification performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classification result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline.Keywords: Convolutional neural networks, object classification, pose normalization, viewpoint invariant.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1120