Search results for: classifiers ensembles
106 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’
Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell
Abstract:
Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML
Procedia PDF Downloads 139105 Wireless Sensor Anomaly Detection Using Soft Computing
Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh
Abstract:
We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.Keywords: IDS, Machine learning, WSN, ZigBee technology
Procedia PDF Downloads 544104 Infrared Photodetectors Based on Nanowire Arrays: Towards Far Infrared Region
Authors: Mohammad Karimi, Magnus Heurlin, Lars Samuelson, Magnus Borgstrom, Hakan Pettersson
Abstract:
Nanowire semiconductors are promising candidates for optoelectronic applications such as solar cells, photodetectors and lasers due to their quasi-1D geometry and large surface to volume ratio. The functional wavelength range of NW-based detectors is typically limited to the visible/near-infrared region. In this work, we present electrical and optical properties of IR photodetectors based on large square millimeter ensembles (>1million) of vertically processed semiconductor heterostructure nanowires (NWs) grown on InP substrates which operate in longer wavelengths. InP NWs comprising single or multiple (20) InAs/InAsP QDics axially embedded in an n-i-n geometry, have been grown on InP substrates using metal organic vapor phase epitaxy (MOVPE). The NWs are contacted in vertical direction by atomic layer deposition (ALD) deposition of 50 nm SiO2 as an insulating layer followed by sputtering of indium tin oxide (ITO) and evaporation of Ti and Au as top contact layer. In order to extend the sensitivity range to the mid-wavelength and long-wavelength regions, the intersubband transition within conduction band of InAsP QDisc is suggested. We present first experimental indications of intersubband photocurrent in NW geometry and discuss important design parameters for realization of intersubband detectors. Key advantages with the proposed design include large degree of freedom in choice of materials compositions, possible enhanced optical resonance effects due to periodically ordered NW arrays and the compatibility with silicon substrates. We believe that the proposed detector design offers the route towards monolithic integration of compact and sensitive III-V NW long wavelength detectors with Si technology.Keywords: intersubband photodetector, infrared, nanowire, quantum disc
Procedia PDF Downloads 388103 Incorporating Information Gain in Regular Expressions Based Classifiers
Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler
Abstract:
A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.Keywords: information gain, regular expressions, smith-waterman algorithm, text classification
Procedia PDF Downloads 321102 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling
Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed
Abstract:
Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.Keywords: machine learning, pattern recognition, facial pose classification, time series
Procedia PDF Downloads 351101 Sub-Pixel Level Classification Using Remote Sensing For Arecanut Crop
Authors: S. Athiralakshmi, B.E. Bhojaraja, U. Pruthviraj
Abstract:
In agriculture, remote sensing is applied for monitoring of plant development, evaluating of physiological processes and growth conditions. Especially valuable are the spatio-temporal aspects of the remotely sensed data in detecting crop state differences and stress situations. In this study, hyperion imagery is used for classifying arecanut crops based on their age so that these maps can be used in yield estimation of crops, irrigation purposes, applying fertilizers etc. Traditional hard classifiers assigns the mixed pixels to the dominant classes. The proposed method uses a sub pixel level classifier called linear spectral unmixing available in ENVI software. It provides the relative abundance of surface materials and the context within a pixel that may be a potential solution to effectively identifying the land-cover distribution. Validation is done referring to field spectra collected using spectroradiometer and the ground control points obtained from GPS.Keywords: FLAASH, Hyperspectral remote sensing, Linear Spectral Unmixing, Spectral Angle Mapper Classifier.
Procedia PDF Downloads 519100 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification
Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike
Abstract:
Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.Keywords: data mining, decision tree, classification, imbalance dataset
Procedia PDF Downloads 13999 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study
Authors: Salima Smiti, Ines Gasmi, Makram Soui
Abstract:
Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.Keywords: credit risk assessment, classification algorithms, data mining, rule extraction
Procedia PDF Downloads 18398 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines
Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma
Abstract:
Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)
Procedia PDF Downloads 27697 Machine Learning-Enabled Classification of Climbing Using Small Data
Authors: Nicholas Milburn, Yu Liang, Dalei Wu
Abstract:
Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence
Procedia PDF Downloads 14496 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering
Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel
Abstract:
Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.Keywords: classification, data mining, spam filtering, naive bayes, decision tree
Procedia PDF Downloads 41395 A Reliable Multi-Type Vehicle Classification System
Authors: Ghada S. Moussa
Abstract:
Vehicle classification is an important task in traffic surveillance and intelligent transportation systems. Classification of vehicle images is facing several problems such as: high intra-class vehicle variations, occlusion, shadow, illumination. These problems and others must be considered to develop a reliable vehicle classification system. In this study, a reliable multi-type vehicle classification system based on Bag-of-Words (BoW) paradigm is developed. Our proposed system used and compared four well-known classifiers; Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), k-Nearest Neighbour (KNN), and Decision Tree to classify vehicles into four categories: motorcycles, small, medium and large. Experiments on a large dataset show that our approach is efficient and reliable in classifying vehicles with accuracy of 95.7%. The SVM outperforms other classification algorithms in terms of both accuracy and robustness alongside considerable reduction in execution time. The innovativeness of developed system is it can serve as a framework for many vehicle classification systems.Keywords: vehicle classification, bag-of-words technique, SVM classifier, LDA classifier, KNN classifier, decision tree classifier, SIFT algorithm
Procedia PDF Downloads 35994 Detecting Memory-Related Gene Modules in sc/snRNA-seq Data by Deep-Learning
Authors: Yong Chen
Abstract:
To understand the detailed molecular mechanisms of memory formation in engram cells is one of the most fundamental questions in neuroscience. Recent single-cell RNA-seq (scRNA-seq) and single-nucleus RNA-seq (snRNA-seq) techniques have allowed us to explore the sparsely activated engram ensembles, enabling access to the molecular mechanisms that underlie experience-dependent memory formation and consolidation. However, the absence of specific and powerful computational methods to detect memory-related genes (modules) and their regulatory relationships in the sc/snRNA-seq datasets has strictly limited the analysis of underlying mechanisms and memory coding principles in mammalian brains. Here, we present a deep-learning method named SCENTBOX, to detect memory-related gene modules and causal regulatory relationships among themfromsc/snRNA-seq datasets. SCENTBOX first constructs codifferential expression gene network (CEGN) from case versus control sc/snRNA-seq datasets. It then detects the highly correlated modules of differential expression genes (DEGs) in CEGN. The deep network embedding and attention-based convolutional neural network strategies are employed to precisely detect regulatory relationships among DEG genes in a module. We applied them on scRNA-seq datasets of TRAP; Ai14 mouse neurons with fear memory and detected not only known memory-related genes, but also the modules and potential causal regulations. Our results provided novel regulations within an interesting module, including Arc, Bdnf, Creb, Dusp1, Rgs4, and Btg2. Overall, our methods provide a general computational tool for processing sc/snRNA-seq data from case versus control studie and a systematic investigation of fear-memory-related gene modules.Keywords: sc/snRNA-seq, memory formation, deep learning, gene module, causal inference
Procedia PDF Downloads 12093 Prosodic Characteristics of Post Traumatic Stress Disorder Induced Speech Changes
Authors: Jarek Krajewski, Andre Wittenborn, Martin Sauerland
Abstract:
This abstract describes a promising approach for estimating post-traumatic stress disorder (PTSD) based on prosodic speech characteristics. It illustrates the validity of this method by briefly discussing results from an Arabic refugee sample (N= 47, 32 m, 15 f). A well-established standardized self-report scale “Reaction of Adolescents to Traumatic Stress” (RATS) was used to determine the ground truth level of PTSD. The speech material was prompted by telling about autobiographical related sadness inducing experiences (sampling rate 16 kHz, 8 bit resolution). In order to investigate PTSD-induced speech changes, a self-developed set of 136 prosodic speech features was extracted from the .wav files. This set was adapted to capture traumatization related speech phenomena. An artificial neural network (ANN) machine learning model was applied to determine the PTSD level and reached a correlation of r = .37. These results indicate that our classifiers can achieve similar results to those seen in speech-based stress research.Keywords: speech prosody, PTSD, machine learning, feature extraction
Procedia PDF Downloads 9192 Towards Integrating Statistical Color Features for Human Skin Detection
Authors: Mohd Zamri Osman, Mohd Aizaini Maarof, Mohd Foad Rohani
Abstract:
Human skin detection recognized as the primary step in most of the applications such as face detection, illicit image filtering, hand recognition and video surveillance. The performance of any skin detection applications greatly relies on the two components: feature extraction and classification method. Skin color is the most vital information used for skin detection purpose. However, color feature alone sometimes could not handle images with having same color distribution with skin color. A color feature of pixel-based does not eliminate the skin-like color due to the intensity of skin and skin-like color fall under the same distribution. Hence, the statistical color analysis will be exploited such mean and standard deviation as an additional feature to increase the reliability of skin detector. In this paper, we studied the effectiveness of statistical color feature for human skin detection. Furthermore, the paper analyzed the integrated color and texture using eight classifiers with three color spaces of RGB, YCbCr, and HSV. The experimental results show that the integrating statistical feature using Random Forest classifier achieved a significant performance with an F1-score 0.969.Keywords: color space, neural network, random forest, skin detection, statistical feature
Procedia PDF Downloads 46291 Multilabel Classification with Neural Network Ensemble Method
Authors: Sezin Ekşioğlu
Abstract:
Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.Keywords: multilabel, classification, neural network, KNN
Procedia PDF Downloads 15590 A Topological Approach for Motion Track Discrimination
Authors: Tegan H. Emerson, Colin C. Olson, George Stantchev, Jason A. Edelberg, Michael Wilson
Abstract:
Detecting small targets at range is difficult because there is not enough spatial information present in an image sub-region containing the target to use correlation-based methods to differentiate it from dynamic confusers present in the scene. Moreover, this lack of spatial information also disqualifies the use of most state-of-the-art deep learning image-based classifiers. Here, we use characteristics of target tracks extracted from video sequences as data from which to derive distinguishing topological features that help robustly differentiate targets of interest from confusers. In particular, we calculate persistent homology from time-delayed embeddings of dynamic statistics calculated from motion tracks extracted from a wide field-of-view video stream. In short, we use topological methods to extract features related to target motion dynamics that are useful for classification and disambiguation and show that small targets can be detected at range with high probability.Keywords: motion tracks, persistence images, time-delay embedding, topological data analysis
Procedia PDF Downloads 11489 Classification of Political Affiliations by Reduced Number of Features
Authors: Vesile Evrim, Aliyu Awwal
Abstract:
By the evolvement in technology, the way of expressing opinions switched the direction to the digital world. The domain of politics as one of the hottest topics of opinion mining research merged together with the behavior analysis for affiliation determination in text which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 are constituted by Linguistic Inquiry and Word Count (LIWC) features are tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that Decision Tree, Rule Induction and M5 Rule classifiers when used with SVM and IGR feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “function” as an aggregate feature of the linguistic category, is obtained as the most differentiating feature among the 68 features with 81% accuracy by itself in classifying articles either as Republican or Democrat.Keywords: feature selection, LIWC, machine learning, politics
Procedia PDF Downloads 38388 Estimation of Fragility Curves Using Proposed Ground Motion Selection and Scaling Procedure
Authors: Esra Zengin, Sinan Akkar
Abstract:
Reliable and accurate prediction of nonlinear structural response requires specification of appropriate earthquake ground motions to be used in nonlinear time history analysis. The current research has mainly focused on selection and manipulation of real earthquake records that can be seen as the most critical step in the performance based seismic design and assessment of the structures. Utilizing amplitude scaled ground motions that matches with the target spectra is commonly used technique for the estimation of nonlinear structural response. Representative ground motion ensembles are selected to match target spectrum such as scenario-based spectrum derived from ground motion prediction equations, Uniform Hazard Spectrum (UHS), Conditional Mean Spectrum (CMS) or Conditional Spectrum (CS). Different sets of criteria exist among those developed methodologies to select and scale ground motions with the objective of obtaining robust estimation of the structural performance. This study presents ground motion selection and scaling procedure that considers the spectral variability at target demand with the level of ground motion dispersion. The proposed methodology provides a set of ground motions whose response spectra match target median and corresponding variance within a specified period interval. The efficient and simple algorithm is used to assemble the ground motion sets. The scaling stage is based on the minimization of the error between scaled median and the target spectra where the dispersion of the earthquake shaking is preserved along the period interval. The impact of the spectral variability on nonlinear response distribution is investigated at the level of inelastic single degree of freedom systems. In order to see the effect of different selection and scaling methodologies on fragility curve estimations, results are compared with those obtained by CMS-based scaling methodology. The variability in fragility curves due to the consideration of dispersion in ground motion selection process is also examined.Keywords: ground motion selection, scaling, uncertainty, fragility curve
Procedia PDF Downloads 58487 A Comparative Study of k-NN and MLP-NN Classifiers Using GA-kNN Based Feature Selection Method for Wood Recognition System
Authors: Uswah Khairuddin, Rubiyah Yusof, Nenny Ruthfalydia Rosli
Abstract:
This paper presents a comparative study between k-Nearest Neighbour (k-NN) and Multi-Layer Perceptron Neural Network (MLP-NN) classifier using Genetic Algorithm (GA) as feature selector for wood recognition system. The features have been extracted from the images using Grey Level Co-Occurrence Matrix (GLCM). The use of GA based feature selection is mainly to ensure that the database used for training the features for the wood species pattern classifier consists of only optimized features. The feature selection process is aimed at selecting only the most discriminating features of the wood species to reduce the confusion for the pattern classifier. This feature selection approach maintains the ‘good’ features that minimizes the inter-class distance and maximizes the intra-class distance. Wrapper GA is used with k-NN classifier as fitness evaluator (GA-kNN). The results shows that k-NN is the best choice of classifier because it uses a very simple distance calculation algorithm and classification tasks can be done in a short time with good classification accuracy.Keywords: feature selection, genetic algorithm, optimization, wood recognition system
Procedia PDF Downloads 54586 Deep Reinforcement Learning and Generative Adversarial Networks Approach to Thwart Intrusions and Adversarial Attacks
Authors: Fabrice Setephin Atedjio, Jean-Pierre Lienou, Frederica F. Nelson, Sachin S. Shetty, Charles A. Kamhoua
Abstract:
Malicious users exploit vulnerabilities in computer systems, significantly disrupting their performance and revealing the inadequacies of existing protective solutions. Even machine learning-based approaches, designed to ensure reliability, can be compromised by adversarial attacks that undermine their robustness. This paper addresses two critical aspects of enhancing model reliability. First, we focus on improving model performance and robustness against adversarial threats. To achieve this, we propose a strategy by harnessing deep reinforcement learning. Second, we introduce an approach leveraging generative adversarial networks to counter adversarial attacks effectively. Our results demonstrate substantial improvements over previous works in the literature, with classifiers exhibiting enhanced accuracy in classification tasks, even in the presence of adversarial perturbations. These findings underscore the efficacy of the proposed model in mitigating intrusions and adversarial attacks within the machine-learning landscape.Keywords: machine learning, reliability, adversarial attacks, deep-reinforcement learning, robustness
Procedia PDF Downloads 1485 A Genetic Algorithm Based Ensemble Method with Pairwise Consensus Score on Malware Cacophonous Labels
Authors: Shih-Yu Wang, Shun-Wen Hsiao
Abstract:
In the field of cybersecurity, there exists many vendors giving malware samples classified results, namely naming after the label that contains some important information which is also called AV label. Lots of researchers relay on AV labels for research. Unfortunately, AV labels are too cluttered. They do not have a fixed format and fixed naming rules because the naming results were based on each classifiers' viewpoints. A way to fix the problem is taking a majority vote. However, voting can sometimes create problems of bias. Thus, we create a novel ensemble approach which does not rely on the cacophonous naming result but depend on group identification to aggregate everyone's opinion. To achieve this purpose, we develop an scoring system called Pairwise Consensus Score (PCS) to calculate result similarity. The entire method architecture combine Genetic Algorithm and PCS to find maximum consensus in the group. Experimental results revealed that our method outperformed the majority voting by 10% in term of the score.Keywords: genetic algorithm, ensemble learning, malware family, malware labeling, AV labels
Procedia PDF Downloads 8784 Artificial Reproduction System and Imbalanced Dataset: A Mendelian Classification
Authors: Anita Kushwaha
Abstract:
We propose a new evolutionary computational model called Artificial Reproduction System which is based on the complex process of meiotic reproduction occurring between male and female cells of the living organisms. Artificial Reproduction System is an attempt towards a new computational intelligence approach inspired by the theoretical reproduction mechanism, observed reproduction functions, principles and mechanisms. A reproductive organism is programmed by genes and can be viewed as an automaton, mapping and reducing so as to create copies of those genes in its off springs. In Artificial Reproduction System, the binding mechanism between male and female cells is studied, parameters are chosen and a network is constructed also a feedback system for self regularization is established. The model then applies Mendel’s law of inheritance, allele-allele associations and can be used to perform data analysis of imbalanced data, multivariate, multiclass and big data. In the experimental study Artificial Reproduction System is compared with other state of the art classifiers like SVM, Radial Basis Function, neural networks, K-Nearest Neighbor for some benchmark datasets and comparison results indicates a good performance.Keywords: bio-inspired computation, nature- inspired computation, natural computing, data mining
Procedia PDF Downloads 27483 Design an Development of an Agorithm for Prioritizing the Test Cases Using Neural Network as Classifier
Authors: Amit Verma, Simranjeet Kaur, Sandeep Kaur
Abstract:
Test Case Prioritization (TCP) has gained wide spread acceptance as it often results in good quality software free from defects. Due to the increase in rate of faults in software traditional techniques for prioritization results in increased cost and time. Main challenge in TCP is difficulty in manually validate the priorities of different test cases due to large size of test suites and no more emphasis are made to make the TCP process automate. The objective of this paper is to detect the priorities of different test cases using an artificial neural network which helps to predict the correct priorities with the help of back propagation algorithm. In our proposed work one such method is implemented in which priorities are assigned to different test cases based on their frequency. After assigning the priorities ANN predicts whether correct priority is assigned to every test case or not otherwise it generates the interrupt when wrong priority is assigned. In order to classify the different priority test cases classifiers are used. Proposed algorithm is very effective as it reduces the complexity with robust efficiency and makes the process automated to prioritize the test cases.Keywords: test case prioritization, classification, artificial neural networks, TF-IDF
Procedia PDF Downloads 39882 Early Recognition and Grading of Cataract Using a Combined Log Gabor/Discrete Wavelet Transform with ANN and SVM
Authors: Hadeer R. M. Tawfik, Rania A. K. Birry, Amani A. Saad
Abstract:
Eyes are considered to be the most sensitive and important organ for human being. Thus, any eye disorder will affect the patient in all aspects of life. Cataract is one of those eye disorders that lead to blindness if not treated correctly and quickly. This paper demonstrates a model for automatic detection, classification, and grading of cataracts based on image processing techniques and artificial intelligence. The proposed system is developed to ease the cataract diagnosis process for both ophthalmologists and patients. The wavelet transform combined with 2D Log Gabor Wavelet transform was used as feature extraction techniques for a dataset of 120 eye images followed by a classification process that classified the image set into three classes; normal, early, and advanced stage. A comparison between the two used classifiers, the support vector machine SVM and the artificial neural network ANN were done for the same dataset of 120 eye images. It was concluded that SVM gave better results than ANN. SVM success rate result was 96.8% accuracy where ANN success rate result was 92.3% accuracy.Keywords: cataract, classification, detection, feature extraction, grading, log-gabor, neural networks, support vector machines, wavelet
Procedia PDF Downloads 33581 A Machine Learning Based Method to Detect System Failure in Resource Constrained Environment
Authors: Payel Datta, Abhishek Das, Abhishek Roychoudhury, Dhiman Chattopadhyay, Tanushyam Chattopadhyay
Abstract:
Machine learning (ML) and deep learning (DL) is most predominantly used in image/video processing, natural language processing (NLP), audio and speech recognition but not that much used in system performance evaluation. In this paper, authors are going to describe the architecture of an abstraction layer constructed using ML/DL to detect the system failure. This proposed system is used to detect the system failure by evaluating the performance metrics of an IoT service deployment under constrained infrastructure environment. This system has been tested on the manually annotated data set containing different metrics of the system, like number of threads, throughput, average response time, CPU usage, memory usage, network input/output captured in different hardware environments like edge (atom based gateway) and cloud (AWS EC2). The main challenge of developing such system is that the accuracy of classification should be 100% as the error in the system has an impact on the degradation of the service performance and thus consequently affect the reliability and high availability which is mandatory for an IoT system. Proposed ML/DL classifiers work with 100% accuracy for the data set of nearly 4,000 samples captured within the organization.Keywords: machine learning, system performance, performance metrics, IoT, edge
Procedia PDF Downloads 19580 Emotion Recognition with Occlusions Based on Facial Expression Reconstruction and Weber Local Descriptor
Authors: Jadisha Cornejo, Helio Pedrini
Abstract:
Recognition of emotions based on facial expressions has received increasing attention from the scientific community over the last years. Several fields of applications can benefit from facial emotion recognition, such as behavior prediction, interpersonal relations, human-computer interactions, recommendation systems. In this work, we develop and analyze an emotion recognition framework based on facial expressions robust to occlusions through the Weber Local Descriptor (WLD). Initially, the occluded facial expressions are reconstructed following an extension approach of Robust Principal Component Analysis (RPCA). Then, WLD features are extracted from the facial expression representation, as well as Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG). The feature vector space is reduced using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Finally, K-Nearest Neighbor (K-NN) and Support Vector Machine (SVM) classifiers are used to recognize the expressions. Experimental results on three public datasets demonstrated that the WLD representation achieved competitive accuracy rates for occluded and non-occluded facial expressions compared to other approaches available in the literature.Keywords: emotion recognition, facial expression, occlusion, fiducial landmarks
Procedia PDF Downloads 18379 The Relationship between Human Pose and Intention to Fire a Handgun
Authors: Joshua van Staden, Dane Brown, Karen Bradshaw
Abstract:
Gun violence is a significant problem in modern-day society. Early detection of carried handguns through closed-circuit television (CCTV) can aid in preventing potential gun violence. However, CCTV operators have a limited attention span. Machine learning approaches to automating the detection of dangerous gun carriers provide a way to aid CCTV operators in identifying these individuals. This study provides insight into the relationship between human key points extracted using human pose estimation (HPE) and their intention to fire a weapon. We examine the feature importance of each keypoint and their correlations. We use principal component analysis (PCA) to reduce the feature space and optimize detection. Finally, we run a set of classifiers to determine what form of classifier performs well on this data. We find that hips, shoulders, and knees tend to be crucial aspects of the human pose when making these predictions. Furthermore, the horizontal position plays a larger role than the vertical position. Of the 66 key points, nine principal components could be used to make nonlinear classifications with 86% accuracy. Furthermore, linear classifications could be done with 85% accuracy, showing that there is a degree of linearity in the data.Keywords: feature engineering, human pose, machine learning, security
Procedia PDF Downloads 9378 A Brain Controlled Robotic Gait Trainer for Neurorehabilitation
Authors: Qazi Umer Jamil, Abubakr Siddique, Mubeen Ur Rehman, Nida Aziz, Mohsin I. Tiwana
Abstract:
This paper discusses a brain controlled robotic gait trainer for neurorehabilitation of Spinal Cord Injury (SCI) patients. Patients suffering from Spinal Cord Injuries (SCI) become unable to execute motion control of their lower proximities due to degeneration of spinal cord neurons. The presented approach can help SCI patients in neuro-rehabilitation training by directly translating patient motor imagery into walkers motion commands and thus bypassing spinal cord neurons completely. A non-invasive EEG based brain-computer interface is used for capturing patient neural activity. For signal processing and classification, an open source software (OpenVibe) is used. Classifiers categorize the patient motor imagery (MI) into a specific set of commands that are further translated into walker motion commands. The robotic walker also employs fall detection for ensuring safety of patient during gait training and can act as a support for SCI patients. The gait trainer is tested with subjects, and satisfactory results were achieved.Keywords: brain computer interface (BCI), gait trainer, spinal cord injury (SCI), neurorehabilitation
Procedia PDF Downloads 16277 Ensemble of Deep CNN Architecture for Classifying the Source and Quality of Teff Cereal
Authors: Belayneh Matebie, Michael Melese
Abstract:
The study focuses on addressing the challenges in classifying and ensuring the quality of Eragrostis Teff, a small and round grain that is the smallest cereal grain. Employing a traditional classification method is challenging because of its small size and the similarity of its environmental characteristics. To overcome this, this study employs a machine learning approach to develop a source and quality classification system for Teff cereal. Data is collected from various production areas in the Amhara regions, considering two types of cereal (high and low quality) across eight classes. A total of 5,920 images are collected, with 740 images for each class. Image enhancement techniques, including scaling, data augmentation, histogram equalization, and noise removal, are applied to preprocess the data. Convolutional Neural Network (CNN) is then used to extract relevant features and reduce dimensionality. The dataset is split into 80% for training and 20% for testing. Different classifiers, including FVGG16, FINCV3, QSCTC, EMQSCTC, SVM, and RF, are employed for classification, achieving accuracy rates ranging from 86.91% to 97.72%. The ensemble of FVGG16, FINCV3, and QSCTC using the Max-Voting approach outperforms individual algorithms.Keywords: Teff, ensemble learning, max-voting, CNN, SVM, RF
Procedia PDF Downloads 57