Search results for: Unsupervised feature learning.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2801

Search results for: Unsupervised feature learning.

2651 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.

Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 116
2650 Feature Based Dense Stereo Matching using Dynamic Programming and Color

Authors: Hajar Sadeghi, Payman Moallem, S. Amirhassn Monadjemi

Abstract:

This paper presents a new feature based dense stereo matching algorithm to obtain the dense disparity map via dynamic programming. After extraction of some proper features, we use some matching constraints such as epipolar line, disparity limit, ordering and limit of directional derivative of disparity as well. Also, a coarseto- fine multiresolution strategy is used to decrease the search space and therefore increase the accuracy and processing speed. The proposed method links the detected feature points into the chains and compares some of the feature points from different chains, to increase the matching speed. We also employ color stereo matching to increase the accuracy of the algorithm. Then after feature matching, we use the dynamic programming to obtain the dense disparity map. It differs from the classical DP methods in the stereo vision, since it employs sparse disparity map obtained from the feature based matching stage. The DP is also performed further on a scan line, between any matched two feature points on that scan line. Thus our algorithm is truly an optimization method. Our algorithm offers a good trade off in terms of accuracy and computational efficiency. Regarding the results of our experiments, the proposed algorithm increases the accuracy from 20 to 70%, and reduces the running time of the algorithm almost 70%.

Keywords: Chain Correspondence, Color Stereo Matching, Dynamic Programming, Epipolar Line, Stereo Vision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2302
2649 Development of Multimedia Learning Application for Mastery Learning Style: A Graduated Difficulty Strategy

Authors: Nur Azlina Mohamed Mokmin, Mona Masood

Abstract:

Guided by the theory of learning styles, this study is based on the development of a multimedia learning application for students with mastery learning style. The learning material was developed by applying a graduated difficulty learning strategy. Algebra was chosen as the learning topic for this application. The effectiveness of this application in helping students learn is measured by giving a pre- and post-test. The result shows that students who learn using the learning material that matches their preferred learning style perform better than the students with a non-personalized learning material.

Keywords: Algebraic Fractions, Graduated Difficulty, Mastery Learning Style, Multimedia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2555
2648 Finding Sparse Features in Face Detection Using Genetic Algorithms

Authors: H. Sagha, S. Kasaei, E. Enayati, M. Dehghani

Abstract:

Although Face detection is not a recent activity in the field of image processing, it is still an open area for research. The greatest step in this field is the work reported by Viola and its recent analogous is Huang et al. Both of them use similar features and also similar training process. The former is just for detecting upright faces, but the latter can detect multi-view faces in still grayscale images using new features called 'sparse feature'. Finding these features is very time consuming and inefficient by proposed methods. Here, we propose a new approach for finding sparse features using a genetic algorithm system. This method requires less computational cost and gets more effective features in learning process for face detection that causes more accuracy.

Keywords: Face Detection, Genetic Algorithms, Sparse Feature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1535
2647 Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Authors: Akella Amarendra Babu, Rama Devi Yellasiri, Natukula Sainath

Abstract:

Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.

Keywords: Pronunciation variations, dynamic programming, machine learning, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 750
2646 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2007
2645 Prediction of MicroRNA-Target Gene by Machine Learning Algorithms in Lung Cancer Study

Authors: Nilubon Kurubanjerdjit, Nattakarn Iam-On, Ka-Lok Ng

Abstract:

MicroRNAs are small non-coding RNA found in many different species. They play crucial roles in cancer such as biological processes of apoptosis and proliferation. The identification of microRNA-target genes can be an essential first step towards to reveal the role of microRNA in various cancer types. In this paper, we predict miRNA-target genes for lung cancer by integrating prediction scores from miRanda and PITA algorithms used as a feature vector of miRNA-target interaction. Then, machine-learning algorithms were implemented for making a final prediction. The approach developed in this study should be of value for future studies into understanding the role of miRNAs in molecular mechanisms enabling lung cancer formation.

Keywords: MicroRNA, miRNAs, lung cancer, machine learning, Naïve Bayes, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2343
2644 Efficient and Effective Gabor Feature Representation for Face Detection

Authors: Yasuomi D. Sato, Yasutaka Kuriya

Abstract:

We here propose improved version of elastic graph matching (EGM) as a face detector, called the multi-scale EGM (MS-EGM). In this improvement, Gabor wavelet-based pyramid reduces computational complexity for the feature representation often used in the conventional EGM, but preserving a critical amount of information about an image. The MS-EGM gives us higher detection performance than Viola-Jones object detection algorithm of the AdaBoost Haar-like feature cascade. We also show rapid detection speeds of the MS-EGM, comparable to the Viola-Jones method. We find fruitful benefits in the MS-EGM, in terms of topological feature representation for a face.

Keywords: Face detection, Gabor wavelet based pyramid, elastic graph matching, topological preservation, redundancy of computational complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1822
2643 sEMG Interface Design for Locomotion Identification

Authors: Rohit Gupta, Ravinder Agarwal

Abstract:

Surface electromyographic (sEMG) signal has the potential to identify the human activities and intention. This potential is further exploited to control the artificial limbs using the sEMG signal from residual limbs of amputees. The paper deals with the development of multichannel cost efficient sEMG signal interface for research application, along with evaluation of proposed class dependent statistical approach of the feature selection method. The sEMG signal acquisition interface was developed using ADS1298 of Texas Instruments, which is a front-end interface integrated circuit for ECG application. Further, the sEMG signal is recorded from two lower limb muscles for three locomotions namely: Plane Walk (PW), Stair Ascending (SA), Stair Descending (SD). A class dependent statistical approach is proposed for feature selection and also its performance is compared with 12 preexisting feature vectors. To make the study more extensive, performance of five different types of classifiers are compared. The outcome of the current piece of work proves the suitability of the proposed feature selection algorithm for locomotion recognition, as compared to other existing feature vectors. The SVM Classifier is found as the outperformed classifier among compared classifiers with an average recognition accuracy of 97.40%. Feature vector selection emerges as the most dominant factor affecting the classification performance as it holds 51.51% of the total variance in classification accuracy. The results demonstrate the potentials of the developed sEMG signal acquisition interface along with the proposed feature selection algorithm.

Keywords: Classifiers, feature selection, locomotion, sEMG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1438
2642 Vision Based People Tracking System

Authors: Boukerch Haroun, Luo Qing Sheng, Li Hua Shi, Boukraa Sebti

Abstract:

In this paper we present the design and the implementation of a target tracking system where the target is set to be a moving person in a video sequence. The system can be applied easily as a vision system for mobile robot. The system is composed of two major parts the first is the detection of the person in the video frame using the SVM learning machine based on the “HOG” descriptors. The second part is the tracking of a moving person it’s done by using a combination of the Kalman filter and a modified version of the Camshift tracking algorithm by adding the target motion feature to the color feature, the experimental results had shown that the new algorithm had overcame the traditional Camshift algorithm in robustness and in case of occlusion.

Keywords: Camshift Algorithm, Computer Vision, Kalman Filter, Object tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1263
2641 A Survey: Clustering Ensembles Techniques

Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha

Abstract:

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3372
2640 Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Authors: Lin Yang, Zhian Mi, Jiacheng Xiao, Rong Li

Abstract:

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.

Keywords: Harmonic analysis, machine learning, music classification and tagging, MIDI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 676
2639 Improved Feature Processing for Iris Biometric Authentication System

Authors: Somnath Dey, Debasis Samanta

Abstract:

Iris-based biometric authentication is gaining importance in recent times. Iris biometric processing however, is a complex process and computationally very expensive. In the overall processing of iris biometric in an iris-based biometric authentication system, feature processing is an important task. In feature processing, we extract iris features, which are ultimately used in matching. Since there is a large number of iris features and computational time increases as the number of features increases, it is therefore a challenge to develop an iris processing system with as few as possible number of features and at the same time without compromising the correctness. In this paper, we address this issue and present an approach to feature extraction and feature matching process. We apply Daubechies D4 wavelet with 4 levels to extract features from iris images. These features are encoded with 2 bits by quantizing into 4 quantization levels. With our proposed approach it is possible to represent an iris template with only 304 bits, whereas existing approaches require as many as 1024 bits. In addition, we assign different weights to different iris region to compare two iris templates which significantly increases the accuracy. Further, we match the iris template based on a weighted similarity measure. Experimental results on several iris databases substantiate the efficacy of our approach.

Keywords: Iris recognition, biometric, feature processing, patternrecognition, pattern matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2097
2638 A Cognitive Model of Character Recognition Using Support Vector Machines

Authors: K. Freedman

Abstract:

In the present study, a support vector machine (SVM) learning approach to character recognition is proposed. Simple feature detectors, similar to those found in the human visual system, were used in the SVM classifier. Alphabetic characters were rotated to 8 different angles and using the proposed cognitive model, all characters were recognized with 100% accuracy and specificity. These same results were found in psychiatric studies of human character recognition.

Keywords: Character recognition, cognitive model, support vector machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1839
2637 Feature Point Reduction for Video Stabilization

Authors: Theerawat Songyot, Tham Manjing, Bunyarit Uyyanonvara, Chanjira Sinthanayothin

Abstract:

Corner detection and optical flow are common techniques for feature-based video stabilization. However, these algorithms are computationally expensive and should be performed at a reasonable rate. This paper presents an algorithm for discarding irrelevant feature points and maintaining them for future use so as to improve the computational cost. The algorithm starts by initializing a maintained set. The feature points in the maintained set are examined against its accuracy for modeling. Corner detection is required only when the feature points are insufficiently accurate for future modeling. Then, optical flows are computed from the maintained feature points toward the consecutive frame. After that, a motion model is estimated based on the simplified affine motion model and least square method, with outliers belonging to moving objects presented. Studentized residuals are used to eliminate such outliers. The model estimation and elimination processes repeat until no more outliers are identified. Finally, the entire algorithm repeats along the video sequence with the points remaining from the previous iteration used as the maintained set. As a practical application, an efficient video stabilization can be achieved by exploiting the computed motion models. Our study shows that the number of times corner detection needs to perform is greatly reduced, thus significantly improving the computational cost. Moreover, optical flow vectors are computed for only the maintained feature points, not for outliers, thus also reducing the computational cost. In addition, the feature points after reduction can sufficiently be used for background objects tracking as demonstrated in the simple video stabilizer based on our proposed algorithm.

Keywords: background object tracking, feature point reduction, low cost tracking, video stabilization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1719
2636 Fuzzy Population-Based Meta-Heuristic Approaches for Attribute Reduction in Rough Set Theory

Authors: Mafarja Majdi, Salwani Abdullah, Najmeh S. Jaddi

Abstract:

One of the global combinatorial optimization problems in machine learning is feature selection. It concerned with removing the irrelevant, noisy, and redundant data, along with keeping the original meaning of the original data. Attribute reduction in rough set theory is an important feature selection method. Since attribute reduction is an NP-hard problem, it is necessary to investigate fast and effective approximate algorithms. In this paper, we proposed two feature selection mechanisms based on memetic algorithms (MAs) which combine the genetic algorithm with a fuzzy record to record travel algorithm and a fuzzy controlled great deluge algorithm, to identify a good balance between local search and genetic search. In order to verify the proposed approaches, numerical experiments are carried out on thirteen datasets. The results show that the MAs approaches are efficient in solving attribute reduction problems when compared with other meta-heuristic approaches.

Keywords: Rough Set Theory, Attribute Reduction, Fuzzy Logic, Memetic Algorithms, Record to Record Algorithm, Great Deluge Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
2635 An Evolutionary Statistical Learning Theory

Authors: Sung-Hae Jun, Kyung-Whan Oh

Abstract:

Statistical learning theory was developed by Vapnik. It is a learning theory based on Vapnik-Chervonenkis dimension. It also has been used in learning models as good analytical tools. In general, a learning theory has had several problems. Some of them are local optima and over-fitting problems. As well, statistical learning theory has same problems because the kernel type, kernel parameters, and regularization constant C are determined subjectively by the art of researchers. So, we propose an evolutionary statistical learning theory to settle the problems of original statistical learning theory. Combining evolutionary computing into statistical learning theory, our theory is constructed. We verify improved performances of an evolutionary statistical learning theory using data sets from KDD cup.

Keywords: Evolutionary computing, Local optima, Over-fitting, Statistical learning theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727
2634 Evolutionary Computing Approach for the Solution of Initial value Problems in Ordinary Differential Equations

Authors: A. Junaid, M. A. Z. Raja, I. M. Qureshi

Abstract:

An evolutionary computing technique for solving initial value problems in Ordinary Differential Equations is proposed in this paper. Neural network is used as a universal approximator while the adaptive parameters of neural networks are optimized by genetic algorithm. The solution is achieved on the continuous grid of time instead of discrete as in other numerical techniques. The comparison is carried out with classical numerical techniques and the solution is found with a uniform accuracy of MSE ≈ 10-9 .

Keywords: Neural networks, Unsupervised learning, Evolutionary computing, Numerical methods, Fitness evaluation function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721
2633 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.

Keywords: Anomaly detection, dimensionality reduction, frequencies selection, modal analysis, neural network, structural health monitoring, vibration measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 638
2632 A Speeded up Robust Scale-Invariant Feature Transform Currency Recognition Algorithm

Authors: Daliyah S. Aljutaili, Redna A. Almutlaq, Suha A. Alharbi, Dina M. Ibrahim

Abstract:

All currencies around the world look very different from each other. For instance, the size, color, and pattern of the paper are different. With the development of modern banking services, automatic methods for paper currency recognition become important in many applications like vending machines. One of the currency recognition architecture’s phases is Feature detection and description. There are many algorithms that are used for this phase, but they still have some disadvantages. This paper proposes a feature detection algorithm, which merges the advantages given in the current SIFT and SURF algorithms, which we call, Speeded up Robust Scale-Invariant Feature Transform (SR-SIFT) algorithm. Our proposed SR-SIFT algorithm overcomes the problems of both the SIFT and SURF algorithms. The proposed algorithm aims to speed up the SIFT feature detection algorithm and keep it robust. Simulation results demonstrate that the proposed SR-SIFT algorithm decreases the average response time, especially in small and minimum number of best key points, increases the distribution of the number of best key points on the surface of the currency. Furthermore, the proposed algorithm increases the accuracy of the true best point distribution inside the currency edge than the other two algorithms.

Keywords: Currency recognition, feature detection and description, SIFT algorithm, SURF algorithm, speeded up and robust features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 812
2631 Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier

Authors: Khin May Win, Nan Sai Moon Kham

Abstract:

Identification of cancer genes that might anticipate the clinical behaviors from different types of cancer disease is challenging due to the huge number of genes and small number of patients samples. The new method is being proposed based on supervised learning of classification like support vector machines (SVMs).A new solution is described by the introduction of the Maximized Margin (MM) in the subset criterion, which permits to get near the least generalization error rate. In class prediction problem, gene selection is essential to improve the accuracy and to identify genes for cancer disease. The performance of the new method was evaluated with real-world data experiment. It can give the better accuracy for classification.

Keywords: Microarray data, feature selection, recursive featureelimination, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1501
2630 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: Time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517
2629 Discovering Complex Regularities by Adaptive Self Organizing Classification

Authors: A. Faro, D. Giordano, F. Maiorana

Abstract:

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optmize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is also able to automatically suggest a strategy for number of classes optimization.The tool is used to classify macroeconomic data that report the most developed countries? import and export. It is possible to classify the countries based on their economic behaviour and use an ad hoc tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation.

Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, cluster interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516
2628 Feature Analysis of Predictive Maintenance Models

Authors: Zhaoan Wang

Abstract:

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

Keywords: Automated supply chain, intelligent manufacturing, predictive maintenance machine learning, feature engineering, model interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1932
2627 Feature Selection with Kohonen Self Organizing Classification Algorithm

Authors: Francesco Maiorana

Abstract:

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Keywords: Clustering algorithm, Data mining, Feature selection, Grid, Kohonen Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3002
2626 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Authors: Sunita Jahirabadkar, Parag Kulkarni

Abstract:

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.

Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1971
2625 Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers

Authors: Emad E Abdallah, A.F. Otoom, ArwaSaqer, Ola Abu-Aisheh, Diana Omari, Ghadeer Salem

Abstract:

As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.

Keywords: Digital investigation, cybercrimes, emails forensics, anonymous emails, writing style, and authorship analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5200
2624 OCR for Script Identification of Hindi (Devnagari) Numerals using Feature Sub Selection by Means of End-Point with Neuro-Memetic Model

Authors: Banashree N. P., R. Vasanta

Abstract:

Recognition of Indian languages scripts is challenging problems. In Optical Character Recognition [OCR], a character or symbol to be recognized can be machine printed or handwritten characters/numerals. There are several approaches that deal with problem of recognition of numerals/character depending on the type of feature extracted and different way of extracting them. This paper proposes a recognition scheme for handwritten Hindi (devnagiri) numerals; most admired one in Indian subcontinent. Our work focused on a technique in feature extraction i.e. global based approach using end-points information, which is extracted from images of isolated numerals. These feature vectors are fed to neuro-memetic model [18] that has been trained to recognize a Hindi numeral. The archetype of system has been tested on varieties of image of numerals. . In proposed scheme data sets are fed to neuro-memetic algorithm, which identifies the rule with highest fitness value of nearly 100 % & template associates with this rule is nothing but identified numerals. Experimentation result shows that recognition rate is 92-97 % compared to other models.

Keywords: OCR, Global Feature, End-Points, Neuro-Memetic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1718
2623 3D Model Retrieval based on Normal Vector Interpolation Method

Authors: Ami Kim, Oubong Gwun, Juwhan Song

Abstract:

In this paper, we proposed the distribution of mesh normal vector direction as a feature descriptor of a 3D model. A normal vector shows the entire shape of a model well. The distribution of normal vectors was sampled in proportion to each polygon's area so that the information on the surface with less surface area may be less reflected on composing a feature descriptor in order to enhance retrieval performance. At the analysis result of ANMRR, the enhancement of approx. 12.4%~34.7% compared to the existing method has also been indicated.

Keywords: Interpolated Normal Vector, Feature Descriptor, 3DModel Retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1438
2622 Parkinsons Disease Classification using Neural Network and Feature Selection

Authors: Anchana Khemphila, Veera Boonjing

Abstract:

In this study, the Multi-Layer Perceptron (MLP)with Back-Propagation learning algorithm are used to classify to effective diagnosis Parkinsons disease(PD).It-s a challenging problem for medical community.Typically characterized by tremor, PD occurs due to the loss of dopamine in the brains thalamic region that results in involuntary or oscillatory movement in the body. A feature selection algorithm along with biomedical test values to diagnose Parkinson disease.Clinical diagnosis is done mostly by doctor-s expertise and experience.But still cases are reported of wrong diagnosis and treatment. Patients are asked to take number of tests for diagnosis.In many cases,not all the tests contribute towards effective diagnosis of a disease.Our work is to classify the presence of Parkinson disease with reduced number of attributes.Original,22 attributes are involved in classify.We use Information Gain to determine the attributes which reduced the number of attributes which is need to be taken from patients.The Artificial neural networks is used to classify the diagnosis of patients.Twenty-Two attributes are reduced to sixteen attributes.The accuracy is in training data set is 82.051% and in the validation data set is 83.333%.

Keywords: Data mining, classification, Parkinson disease, artificial neural networks, feature selection, information gain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3716