Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2993

Search results for: discriminative LMA features

2993 Relevant LMA Features for Human Motion Recognition

Authors: Insaf Ajili, Malik Mallem, Jean-Yves Didier


Motion recognition from videos is actually a very complex task due to the high variability of motions. This paper describes the challenges of human motion recognition, especially motion representation step with relevant features. Our descriptor vector is inspired from Laban Movement Analysis method. We propose discriminative features using the Random Forest algorithm in order to remove redundant features and make learning algorithms operate faster and more effectively. We validate our method on MSRC-12 and UTKinect datasets.

Keywords: discriminative LMA features, features reduction, human motion recognition, random forest

Procedia PDF Downloads 85
2992 Improved Performance in Content-Based Image Retrieval Using Machine Learning Approach

Authors: B. Ramesh Naik, T. Venugopal


This paper presents a novel approach which improves the high-level semantics of images based on machine learning approach. The contemporary approaches for image retrieval and object recognition includes Fourier transforms, Wavelets, SIFT and HoG. Though these descriptors helpful in a wide range of applications, they exploit zero order statistics, and this lacks high descriptiveness of image features. These descriptors usually take benefit of primitive visual features such as shape, color, texture and spatial locations to describe images. These features do not adequate to describe high-level semantics of the images. This leads to a gap in semantic content caused to unacceptable performance in image retrieval system. A novel method has been proposed referred as discriminative learning which is derived from machine learning approach that efficiently discriminates image features. The analysis and results of proposed approach were validated thoroughly on WANG and Caltech-101 Databases. The results proved that this approach is very competitive in content-based image retrieval.

Keywords: CBIR, discriminative learning, region weight learning, scale invariant feature transforms

Procedia PDF Downloads 100
2991 KSVD-SVM Approach for Spontaneous Facial Expression Recognition

Authors: Dawood Al Chanti, Alice Caplier


Sparse representations of signals have received a great deal of attention in recent years. In this paper, the interest of using sparse representation as a mean for performing sparse discriminative analysis between spontaneous facial expressions is demonstrated. An automatic facial expressions recognition system is presented. It uses a KSVD-SVM approach which is made of three main stages: A pre-processing and feature extraction stage, which solves the problem of shared subspace distribution based on the random projection theory, to obtain low dimensional discriminative and reconstructive features; A dictionary learning and sparse coding stage, which uses the KSVD model to learn discriminative under or over dictionaries for sparse coding; Finally a classification stage, which uses a SVM classifier for facial expressions recognition. Our main concern is to be able to recognize non-basic affective states and non-acted expressions. Extensive experiments on the JAFFE static acted facial expressions database but also on the DynEmo dynamic spontaneous facial expressions database exhibit very good recognition rates.

Keywords: dictionary learning, random projection, pose and spontaneous facial expression, sparse representation

Procedia PDF Downloads 214
2990 Time-Frequency Feature Extraction Method Based on Micro-Doppler Signature of Ground Moving Targets

Authors: Ke Ren, Huiruo Shi, Linsen Li, Baoshuai Wang, Yu Zhou


Since some discriminative features are required for ground moving targets classification, we propose a new feature extraction method based on micro-Doppler signature. Firstly, the time-frequency analysis of measured data indicates that the time-frequency spectrograms of the three kinds of ground moving targets, i.e., single walking person, two people walking and a moving wheeled vehicle, are discriminative. Then, a three-dimensional time-frequency feature vector is extracted from the time-frequency spectrograms to depict these differences. At last, a Support Vector Machine (SVM) classifier is trained with the proposed three-dimensional feature vector. The classification accuracy to categorize ground moving targets into the three kinds of the measured data is found to be over 96%, which demonstrates the good discriminative ability of the proposed micro-Doppler feature.

Keywords: micro-doppler, time-frequency analysis, feature extraction, radar target classification

Procedia PDF Downloads 325
2989 Dynamic Gabor Filter Facial Features-Based Recognition of Emotion in Video Sequences

Authors: T. Hari Prasath, P. Ithaya Rani


In the world of visual technology, recognizing emotions from the face images is a challenging task. Several related methods have not utilized the dynamic facial features effectively for high performance. This paper proposes a method for emotions recognition using dynamic facial features with high performance. Initially, local features are captured by Gabor filter with different scale and orientations in each frame for finding the position and scale of face part from different backgrounds. The Gabor features are sent to the ensemble classifier for detecting Gabor facial features. The region of dynamic features is captured from the Gabor facial features in the consecutive frames which represent the dynamic variations of facial appearances. In each region of dynamic features is normalized using Z-score normalization method which is further encoded into binary pattern features with the help of threshold values. The binary features are passed to Multi-class AdaBoost classifier algorithm with the well-trained database contain happiness, sadness, surprise, fear, anger, disgust, and neutral expressions to classify the discriminative dynamic features for emotions recognition. The developed method is deployed on the Ryerson Multimedia Research Lab and Cohn-Kanade databases and they show significant performance improvement owing to their dynamic features when compared with the existing methods.

Keywords: detecting face, Gabor filter, multi-class AdaBoost classifier, Z-score normalization

Procedia PDF Downloads 195
2988 Protein Remote Homology Detection and Fold Recognition by Combining Profiles with Kernel Methods

Authors: Bin Liu


Protein remote homology detection and fold recognition are two most important tasks in protein sequence analysis, which is critical for protein structure and function studies. In this study, we combined the profile-based features with various string kernels, and constructed several computational predictors for protein remote homology detection and fold recognition. Experimental results on two widely used benchmark datasets showed that these methods outperformed the competing methods, indicating that these predictors are useful computational tools for protein sequence analysis. By analyzing the discriminative features of the training models, some interesting patterns were discovered, reflecting the characteristics of protein superfamilies and folds, which are important for the researchers who are interested in finding the patterns of protein folds.

Keywords: protein remote homology detection, protein fold recognition, profile-based features, Support Vector Machines (SVMs)

Procedia PDF Downloads 82
2987 A Selection Approach: Discriminative Model for Nominal Attributes-Based Distance Measures

Authors: Fang Gong


Distance measures are an indispensable part of many instance-based learning (IBL) and machine learning (ML) algorithms. The value difference metrics (VDM) and inverted specific-class distance measure (ISCDM) are among the top-performing distance measures that address nominal attributes. VDM performs well in some domains owing to its simplicity and poorly in others that exist missing value and non-class attribute noise. ISCDM, however, typically works better than VDM on such domains. To maximize their advantages and avoid disadvantages, in this paper, a selection approach: a discriminative model for nominal attributes-based distance measures is proposed. More concretely, VDM and ISCDM are built independently on a training dataset at the training stage, and the most credible one is recorded for each training instance. At the test stage, its nearest neighbor for each test instance is primarily found by any of VDM and ISCDM and then chooses the most reliable model of its nearest neighbor to predict its class label. It is simply denoted as a discriminative distance measure (DDM). Experiments are conducted on the 34 University of California at Irvine (UCI) machine learning repository datasets, and it shows DDM retains the interpretability and simplicity of VDM and ISCDM but significantly outperforms the original VDM and ISCDM and other state-of-the-art competitors in terms of accuracy.

Keywords: distance measure, discriminative model, nominal attributes, nearest neighbor

Procedia PDF Downloads 48
2986 Automated Classification of Hypoxia from Fetal Heart Rate Using Advanced Data Models of Intrapartum Cardiotocography

Authors: Malarvizhi Selvaraj, Paul Fergus, Andy Shaw


Uterine contractions produced during labour have the potential to damage the foetus by diminishing the maternal blood flow to the placenta. In order to observe this phenomenon labour and delivery are routinely monitored using cardiotocography monitors. An obstetrician usually makes the diagnosis of foetus hypoxia by interpreting cardiotocography recordings. However, cardiotocography capture and interpretation is time-consuming and subjective, often lead to misclassification that causes damage to the foetus and unnecessary caesarean section. Both of these have a high impact on the foetus and the cost to the national healthcare services. Automatic detection of foetal heart rate may be an objective solution to help to reduce unnecessary medical interventions, as reported in several studies. This paper aim is to provide a system for better identification and interpretation of abnormalities of the fetal heart rate using RStudio. An open dataset of 552 Intrapartum recordings has been filtered with 0.034 Hz filters in an attempt to remove noise while keeping as much of the discriminative data as possible. Features were chosen following an extensive literature review, which concluded with FIGO features such as acceleration, deceleration, mean, variance and standard derivation. The five features were extracted from 552 recordings. Using these features, recordings will be classified either normal or abnormal. If the recording is abnormal, it has got more chances of hypoxia.

Keywords: cardiotocography, foetus, intrapartum, hypoxia

Procedia PDF Downloads 133
2985 A Comprehensive Study and Evaluation on Image Fashion Features Extraction

Authors: Yuanchao Sang, Zhihao Gong, Longsheng Chen, Long Chen


Clothing fashion represents a human’s aesthetic appreciation towards everyday outfits and appetite for fashion, and it reflects the development of status in society, humanity, and economics. However, modelling fashion by machine is extremely challenging because fashion is too abstract to be efficiently described by machines. Even human beings can hardly reach a consensus about fashion. In this paper, we are dedicated to answering a fundamental fashion-related problem: what image feature best describes clothing fashion? To address this issue, we have designed and evaluated various image features, ranging from traditional low-level hand-crafted features to mid-level style awareness features to various current popular deep neural network-based features, which have shown state-of-the-art performance in various vision tasks. In summary, we tested the following 9 feature representations: color, texture, shape, style, convolutional neural networks (CNNs), CNNs with distance metric learning (CNNs&DML), AutoEncoder, CNNs with multiple layer combination (CNNs&MLC) and CNNs with dynamic feature clustering (CNNs&DFC). Finally, we validated the performance of these features on two publicly available datasets. Quantitative and qualitative experimental results on both intra-domain and inter-domain fashion clothing image retrieval showed that deep learning based feature representations far outweigh traditional hand-crafted feature representation. Additionally, among all deep learning based methods, CNNs with explicit feature clustering performs best, which shows feature clustering is essential for discriminative fashion feature representation.

Keywords: convolutional neural network, feature representation, image processing, machine modelling

Procedia PDF Downloads 67
2984 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar


Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 254
2983 Analysis of Non-Coding Genome in Streptococcus pneumoniae for Molecular Epidemiology Typing

Authors: Martynova Alina, Lyubov Buzoleva


Streptococcus pneumoniae is the causative agent of pneumonias and meningitids throught all the world. Having high genetic diversity, this microorganism can cause different clinical forms of pneumococcal infections and microbiologically it is really difficult diagnosed by routine methods. Also, epidemiological surveillance requires more developed methods of molecular typing because the recent method of serotyping doesn't allow to distinguish invasive and non-invasive isolates properly. Non-coding genome of bacteria seems to be the interesting source for seeking of highly distinguishable markers to discriminate the subspecies of such a variable bacteria as Streptococcus pneumoniae. Technically, we proposed scheme of discrimination of S.pneumoniae strains with amplification of non-coding region (SP_1932) with the following restriction with 2 types of enzymes of Alu1 and Mn1. Aim: This research aimed to compare different methods of typing and their application for molecular epidemiology purposes. Methods: we analyzed population of 100 strains of S.pneumoniae isolated from different patients by different molecular epidemiology methods such as pulse-field gel electophoresis (PFGE), restriction polymorphism analysis (RFLP) and multilolocus sequence typing (MLST), and all of them were compared with classic typing method as serotyping. The discriminative power was estimated with Simpson Index (SI). Results: We revealed that the most discriminative typing method is RFLP (SI=0,97, there were distinguished 42 genotypes).PFGE was slightly less discriminative (SI=0,95, we identified 35 genotypes). MLST is still the best reference method (SI=1.0). Classic method of serotyping showed quite weak discriminative power (SI=0,93, 24 genotypes). In addition, sensivity of RFLP was 100%, specificity was 97,09%. Conclusion: the most appropriate method for routine epidemiology surveillance is RFLP with non-coding region of Streptococcsu pneumoniae, then PFGE, though in some cases these results should be obligatory confirmed by MLST.

Keywords: molecular epidemiology typing, non-coding genome, Streptococcus pneumoniae, MLST

Procedia PDF Downloads 310
2982 Bag of Local Features for Person Re-Identification on Large-Scale Datasets

Authors: Yixiu Liu, Yunzhou Zhang, Jianning Chi, Hao Chu, Rui Zheng, Libo Sun, Guanghao Chen, Fangtong Zhou


In the last few years, large-scale person re-identification has attracted a lot of attention from video surveillance since it has a potential application prospect in public safety management. However, it is still a challenging job considering the variation in human pose, the changing illumination conditions and the lack of paired samples. Although the accuracy has been significantly improved, the data dependence of the sample training is serious. To tackle this problem, a new strategy is proposed based on bag of visual words (BoVW) model of designing the feature representation which has been widely used in the field of image retrieval. The local features are extracted, and more discriminative feature representation is obtained by cross-view dictionary learning (CDL), then the assignment map is obtained through k-means clustering. Finally, the BoVW histograms are formed which encodes the images with the statistics of the feature classes in the assignment map. Experiments conducted on the CUHK03, Market1501 and MARS datasets show that the proposed method performs favorably against existing approaches.

Keywords: bag of visual words, cross-view dictionary learning, person re-identification, reranking

Procedia PDF Downloads 99
2981 Design and Implementation of Generative Models for Odor Classification Using Electronic Nose

Authors: Kumar Shashvat, Amol P. Bhondekar


In the midst of the five senses, odor is the most reminiscent and least understood. Odor testing has been mysterious and odor data fabled to most practitioners. The delinquent of recognition and classification of odor is important to achieve. The facility to smell and predict whether the artifact is of further use or it has become undesirable for consumption; the imitation of this problem hooked on a model is of consideration. The general industrial standard for this classification is color based anyhow; odor can be improved classifier than color based classification and if incorporated in machine will be awfully constructive. For cataloging of odor for peas, trees and cashews various discriminative approaches have been used Discriminative approaches offer good prognostic performance and have been widely used in many applications but are incapable to make effectual use of the unlabeled information. In such scenarios, generative approaches have better applicability, as they are able to knob glitches, such as in set-ups where variability in the series of possible input vectors is enormous. Generative models are integrated in machine learning for either modeling data directly or as a transitional step to form an indeterminate probability density function. The algorithms or models Linear Discriminant Analysis and Naive Bayes Classifier have been used for classification of the odor of cashews. Linear Discriminant Analysis is a method used in data classification, pattern recognition, and machine learning to discover a linear combination of features that typifies or divides two or more classes of objects or procedures. The Naive Bayes algorithm is a classification approach base on Bayes rule and a set of qualified independence theory. Naive Bayes classifiers are highly scalable, requiring a number of restraints linear in the number of variables (features/predictors) in a learning predicament. The main recompenses of using the generative models are generally a Generative Models make stronger assumptions about the data, specifically, about the distribution of predictors given the response variables. The Electronic instrument which is used for artificial odor sensing and classification is an electronic nose. This device is designed to imitate the anthropological sense of odor by providing an analysis of individual chemicals or chemical mixtures. The experimental results have been evaluated in the form of the performance measures i.e. are accuracy, precision and recall. The investigational results have proven that the overall performance of the Linear Discriminant Analysis was better in assessment to the Naive Bayes Classifier on cashew dataset.

Keywords: odor classification, generative models, naive bayes, linear discriminant analysis

Procedia PDF Downloads 303
2980 Tree Species Classification Using Effective Features of Polarimetric SAR and Hyperspectral Images

Authors: Milad Vahidi, Mahmod R. Sahebi, Mehrnoosh Omati, Reza Mohammadi


Forest management organizations need information to perform their work effectively. Remote sensing is an effective method to acquire information from the Earth. Two datasets of remote sensing images were used to classify forested regions. Firstly, all of extractable features from hyperspectral and PolSAR images were extracted. The optical features were spectral indexes related to the chemical, water contents, structural indexes, effective bands and absorption features. Also, PolSAR features were the original data, target decomposition components, and SAR discriminators features. Secondly, the particle swarm optimization (PSO) and the genetic algorithms (GA) were applied to select optimization features. Furthermore, the support vector machine (SVM) classifier was used to classify the image. The results showed that the combination of PSO and SVM had higher overall accuracy than the other cases. This combination provided overall accuracy about 90.56%. The effective features were the spectral index, the bands in shortwave infrared (SWIR) and the visible ranges and certain PolSAR features.

Keywords: hyperspectral, PolSAR, feature selection, SVM

Procedia PDF Downloads 126
2979 2D Point Clouds Features from Radar for Helicopter Classification

Authors: Danilo Habermann, Aleksander Medella, Carla Cremon, Yusef Caceres


This paper aims to analyze the ability of 2d point clouds features to classify different models of helicopters using radars. This method does not need to estimate the blade length, the number of blades of helicopters, and the period of their micro-Doppler signatures. It is also not necessary to generate spectrograms (or any other image based on time and frequency domain). This work transforms a radar return signal into a 2D point cloud and extracts features of it. Three classifiers are used to distinguish 9 different helicopter models in order to analyze the performance of the features used in this work. The high accuracy obtained with each of the classifiers demonstrates that the 2D point clouds features are very useful for classifying helicopters from radar signal.

Keywords: helicopter classification, point clouds features, radar, supervised classifiers

Procedia PDF Downloads 104
2978 New Features for Copy-Move Image Forgery Detection

Authors: Michael Zimba


A novel set of features for copy-move image forgery, CMIF, detection method is proposed. The proposed set presents a new approach which relies on electrostatic field theory, EFT. Solely for the purpose of reducing the dimension of a suspicious image, firstly performs discrete wavelet transform, DWT, of the suspicious image and extracts only the approximation subband. The extracted subband is then bijectively mapped onto a virtual electrostatic field where concepts of EFT are utilised to extract robust features. The extracted features are shown to be invariant to additive noise, JPEG compression, and affine transformation. The proposed features can also be used in general object matching.

Keywords: virtual electrostatic field, features, affine transformation, copy-move image forgery

Procedia PDF Downloads 438
2977 Unifying Heidegger and Sartre: A Way via Yogācāra Buddhism

Authors: Wing Cheuk Chan


It is well-known that Heidegger was highly critical of Sartre’s existential philosophy. In his famous “Letter on Humanism,” Heidegger not only draw a clear cutline between his thinking of Being and Sartre’s existentialism but also introduced a kind of anti-humanism. Such a hostile attitude towards Sartre’sExistentialism as Humanism seems to have created an unbridgeable gap between these them. Indeed, already in his Being and Nothingness, Sartre complained: Heidegger “has completely avoided any appeal to consciousness in his description of Dasein.”In reality, Sartre was mainly faithful to Husserlianphenomenology, in spite of his rejection of Husserl’s idealism. Thanks to the Japanese Buddhist scholar Yoshifumi Ueda’s work on the Old School of Yogācāra Buddhismas represented by Sthiramati and Paramārtha, we learn that in additional to thethesis of transforming vijñāna (knowing consciousness) into jñāna (wisdom), there is an idea of pṛṣṭa-labdha-jñāna (the subsequently acquired wisdom). According to Ueda, the latter is a “non-discriminative discrimination.” This gives rise to a possibility of synthesizing Heidegger’s thinking of Being and Sartre’s existential phenomenology. Structurally, this paper will firstshow that Heidegger focuses on the side of non-discrimination, whereas Sartre concentrates on the side of discrimination. It will then clarify in what sense thateach of them, in itself, remains incomplete. Finally, it will demonstratehow to synthesize them in term of the notion of “non-discriminative discrimination.”

Keywords: heidegger, sartre, phenomenology, yogācāra buddhism

Procedia PDF Downloads 17
2976 Identification of Breast Anomalies Based on Deep Convolutional Neural Networks and K-Nearest Neighbors

Authors: Ayyaz Hussain, Tariq Sadad


Breast cancer (BC) is one of the widespread ailments among females globally. The early prognosis of BC can decrease the mortality rate. Exact findings of benign tumors can avoid unnecessary biopsies and further treatments of patients under investigation. However, due to variations in images, it is a tough job to isolate cancerous cases from normal and benign ones. The machine learning technique is widely employed in the classification of BC pattern and prognosis. In this research, a deep convolution neural network (DCNN) called AlexNet architecture is employed to get more discriminative features from breast tissues. To achieve higher accuracy, K-nearest neighbor (KNN) classifiers are employed as a substitute for the softmax layer in deep learning. The proposed model is tested on a widely used breast image database called MIAS dataset for experimental purposes and achieved 99% accuracy.

Keywords: breast cancer, DCNN, KNN, mammography

Procedia PDF Downloads 52
2975 Using Reservoir Models for Monitoring Geothermal Surface Features

Authors: John P. O’Sullivan, Thomas M. P. Ratouis, Michael J. O’Sullivan


As the use of geothermal energy grows internationally more effort is required to monitor and protect areas with rare and important geothermal surface features. A number of approaches are presented for developing and calibrating numerical geothermal reservoir models that are capable of accurately representing geothermal surface features. The approaches are discussed in the context of cases studies of the Rotorua geothermal system and the Orakei-korako geothermal system, both of which contain important surface features. The results show that models are able to match the available field data accurately and hence can be used as valuable tools for predicting the future response of the systems to changes in use.

Keywords: geothermal reservoir models, surface features, monitoring, TOUGH2

Procedia PDF Downloads 321
2974 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu


Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 241
2973 An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods

Authors: Issa Qabaja, Fadi Thabtah


Email phishing classification is one of the vital problems in the online security research domain that have attracted several scholars due to its impact on the users payments performed daily online. One aspect to reach a good performance by the detection algorithms in the email phishing problem is to identify the minimal set of features that significantly have an impact on raising the phishing detection rate. This paper investigate three known feature selection methods named Information Gain (IG), Chi-square and Correlation Features Set (CFS) on the email phishing problem to separate high influential features from low influential ones in phishing detection. We measure the degree of influentially by applying four data mining algorithms on a large set of features. We compare the accuracy of these algorithms on the complete features set before feature selection has been applied and after feature selection has been applied. After conducting experiments, the results show 12 common significant features have been chosen among the considered features by the feature selection methods. Further, the average detection accuracy derived by the data mining algorithms on the reduced 12-features set was very slight affected when compared with the one derived from the 47-features set.

Keywords: data mining, email classification, phishing, online security

Procedia PDF Downloads 359
2972 Exploring Syntactic and Semantic Features for Text-Based Authorship Attribution

Authors: Haiyan Wu, Ying Liu, Shaoyun Shi


Authorship attribution is to extract features to identify authors of anonymous documents. Many previous works on authorship attribution focus on statistical style features (e.g., sentence/word length), content features (e.g., frequent words, n-grams). Modeling these features by regression or some transparent machine learning methods gives a portrait of the authors' writing style. But these methods do not capture the syntactic (e.g., dependency relationship) or semantic (e.g., topics) information. In recent years, some researchers model syntactic trees or latent semantic information by neural networks. However, few works take them together. Besides, predictions by neural networks are difficult to explain, which is vital in authorship attribution tasks. In this paper, we not only utilize the statistical style and content features but also take advantage of both syntactic and semantic features. Different from an end-to-end neural model, feature selection and prediction are two steps in our method. An attentive n-gram network is utilized to select useful features, and logistic regression is applied to give prediction and understandable representation of writing style. Experiments show that our extracted features can improve the state-of-the-art methods on three benchmark datasets.

Keywords: authorship attribution, attention mechanism, syntactic feature, feature extraction

Procedia PDF Downloads 48
2971 Microscopic Features Influences on Textile Fabrics Self-Cleaning Ability

Authors: Ayat Adnan Atwah


Self-cleaning ability in textile fabrics was comprehensively investigated in the last decade. Most of these investigations have used surface roughness, and low surface energy features to establish a self-cleaning mechanism. Extensive research articles and reviews have been published to describe these processes along with their microscopic features. When these reviewed with a critical eye, it has been found that a comprehensive effort is still required to compile all these previous research, emphasizing how textile fabrics' microscopic features can influence their self-cleaning ability. No research has been conducted to explore the self-cleaning potential of microscopic geometrical features of fabric at the woven structural level. Researchers used microscopic features to increase the mechanical strength of the fabric. However, they did not change the microscopic features at a woven level to evaluate the self-cleaning ability. In the existing literature, researchers have tried to develop self-cleaning textiles with the help of coatings on the fabric. These coatings are applied to the fabrics by using spray and nanoparticle processing. The coatings create a different surface on the fabric, and hence the changes in the microscopic features of this surface control the self-cleaning ability. Instead of using an additional coating, the microscopic features of the fabric itself can also influence the surface roughness and low surface energy and provide self-cleaning ability at the woven structural level. Key microscopic features like surface roughness, porosity, and wettability of a textile fabric are still not comprehensively investigated for their influence on fabric’s self-cleaning ability. Significantly, the interdependencies between these features with overall fabric geometry at the woven level have not been explored quantitatively. Qualitative observations have been made mainly in the past literature. However, fabrics with self-cleaning ability to be produced in mass production require extensive empirical studies. These studies must involve parametric analysis on varying values of the microscopic features and their quantitative influence on the desired self-cleaning feature.

Keywords: self-cleaning ability, influence, microscopic features, textile fabrics

Procedia PDF Downloads 101
2970 Using New Machine Algorithms to Classify Iranian Musical Instruments According to Temporal, Spectral and Coefficient Features

Authors: Ronak Khosravi, Mahmood Abbasi Layegh, Siamak Haghipour, Avin Esmaili


In this paper, a study on classification of musical woodwind instruments using a small set of features selected from a broad range of extracted ones by the sequential forward selection method was carried out. Firstly, we extract 42 features for each record in the music database of 402 sound files belonging to five different groups of Flutes (end blown and internal duct), Single –reed, Double –reed (exposed and capped), Triple reed and Quadruple reed. Then, the sequential forward selection method is adopted to choose the best feature set in order to achieve very high classification accuracy. Two different classification techniques of support vector machines and relevance vector machines have been tested out and an accuracy of up to 96% can be achieved by using 21 time, frequency and coefficient features and relevance vector machine with the Gaussian kernel function.

Keywords: coefficient features, relevance vector machines, spectral features, support vector machines, temporal features

Procedia PDF Downloads 208
2969 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: 'Reddit'

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell


Native Language Identification is one of the growing subfields in Natural Language Processing (NLP). The task of Native Language Identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL) and then the trained models are evaluated on a different data using an external corpus (Reddit). Several features that have been proven useful for NLI tasks are used in this work (such as word n-grams, character n-grams, POS n-grams and function words). Three classifiers are used in this task; the baseline, linear SVM, and Logistic Regression. Two experiments are conducted in this paper. The first one is to explore the performance of the content-based features versus the content independent ones within one domain using TOEFL corpus. The second experiment is to examine how the trained models perform in a different data using an external corpus called Reddit. The aim is to find out which features (content-based vs content independent features) are more accurate when tested on a different corpus. Results show that content-based features are more accurate and robust than content independent ones when tested within corpus and across corpus.

Keywords: Native Language Identification, Social Media Data: Reddit, NLP, Content-based features, Content independent features

Procedia PDF Downloads 35
2968 Research on Perceptual Features of Couchsurfers on New Hospitality Tourism Platform Couchsurfing

Authors: Yuanxiang Miao


This paper aims to examine the perceptual features of couchsurfers on a new hospitality tourism platform, the free homestay website couchsurfing. As a local host, the author has accepted 61 couchsurfers in Kyoto, Japan, and attempted to figure out couchsurfers' characteristics on perception by hosting them. Moreover, the methodology of this research is mainly based on in-depth interviews, by talking with couchsurfers, observing their behaviors, doing questionnaires, etc. Five dominant perceptual features of couchsurfers were identified: (1) Trusting; (2) Meeting; (3) Sharing; (4) Reciprocity; (5) Worries. The value of this research lies in figuring out a deeper understanding of the perceptual features of couchsurfers, and the author indeed hosted and stayed with 61 couchsurfers from 30 countries and areas over one year. Lastly, the author offers practical suggestions for future research.

Keywords: couchsurfing, depth interview, hospitality tourism, perceptual features

Procedia PDF Downloads 66
2967 The Latent Model of Linguistic Features in Korean College Students’ L2 Argumentative Writings: Syntactic Complexity, Lexical Complexity, and Fluency

Authors: Jiyoung Bae, Gyoomi Kim


This study explores a range of linguistic features used in Korean college students’ argumentative writings for the purpose of developing a model that identifies variables which predict writing proficiencies. This study investigated the latent variable structure of L2 linguistic features, including syntactic complexity, the lexical complexity, and fluency. One hundred forty-six university students in Korea participated in this study. The results of the study’s confirmatory factor analysis (CFA) showed that indicators of linguistic features from this study-provided a foundation for re-categorizing indicators found in extant research on L2 Korean writers depending on each latent variable of linguistic features. The CFA models indicated one measurement model of L2 syntactic complexity and L2 learners’ writing proficiency; these two latent factors were correlated with each other. Based on the overall findings of the study, integrated linguistic features of L2 writings suggested some pedagogical implications in L2 writing instructions.

Keywords: linguistic features, syntactic complexity, lexical complexity, fluency

Procedia PDF Downloads 72
2966 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud


Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: gradient boosting, XGBoost, LightGBM, CatBoost, home credit

Procedia PDF Downloads 81
2965 Preparation of Papers - Developing a Leukemia Diagnostic System Based on Hybrid Deep Learning Architectures in Actual Clinical Environments

Authors: Skyler Kim


An early diagnosis of leukemia has always been a challenge to doctors and hematologists. On a worldwide basis, it was reported that there were approximately 350,000 new cases in 2012, and diagnosing leukemia was time-consuming and inefficient because of an endemic shortage of flow cytometry equipment in current clinical practice. As the number of medical diagnosis tools increased and a large volume of high-quality data was produced, there was an urgent need for more advanced data analysis methods. One of these methods was the AI approach. This approach has become a major trend in recent years, and several research groups have been working on developing these diagnostic models. However, designing and implementing a leukemia diagnostic system in real clinical environments based on a deep learning approach with larger sets remains complex. Leukemia is a major hematological malignancy that results in mortality and morbidity throughout different ages. We decided to select acute lymphocytic leukemia to develop our diagnostic system since acute lymphocytic leukemia is the most common type of leukemia, accounting for 74% of all children diagnosed with leukemia. The results from this development work can be applied to all other types of leukemia. To develop our model, the Kaggle dataset was used, which consists of 15135 total images, 8491 of these are images of abnormal cells, and 5398 images are normal. In this paper, we design and implement a leukemia diagnostic system in a real clinical environment based on deep learning approaches with larger sets. The proposed diagnostic system has the function of detecting and classifying leukemia. Different from other AI approaches, we explore hybrid architectures to improve the current performance. First, we developed two independent convolutional neural network models: VGG19 and ResNet50. Then, using both VGG19 and ResNet50, we developed a hybrid deep learning architecture employing transfer learning techniques to extract features from each input image. In our approach, fusing the features from specific abstraction layers can be deemed as auxiliary features and lead to further improvement of the classification accuracy. In this approach, features extracted from the lower levels are combined into higher dimension feature maps to help improve the discriminative capability of intermediate features and also overcome the problem of network gradient vanishing or exploding. By comparing VGG19 and ResNet50 and the proposed hybrid model, we concluded that the hybrid model had a significant advantage in accuracy. The detailed results of each model’s performance and their pros and cons will be presented in the conference.

Keywords: acute lymphoblastic leukemia, hybrid model, leukemia diagnostic system, machine learning

Procedia PDF Downloads 85
2964 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour


Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: decision tree, feature selection, intrusion detection system, support vector machine

Procedia PDF Downloads 175