Search results for: classifiers accuracy
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3831

Search results for: classifiers accuracy

3741 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 258
3740 An Ensemble System of Classifiers for Computer-Aided Volcano Monitoring

Authors: Flavio Cannavo

Abstract:

Continuous evaluation of the status of potentially hazardous volcanos plays a key role for civil protection purposes. The importance of monitoring volcanic activity, especially for energetic paroxysms that usually come with tephra emissions, is crucial not only for exposures to the local population but also for airline traffic. Presently, real-time surveillance of most volcanoes worldwide is essentially delegated to one or more human experts in volcanology, who interpret data coming from different kind of monitoring networks. Unfavorably, the high nonlinearity of the complex and coupled volcanic dynamics leads to a large variety of different volcanic behaviors. Moreover, continuously measured parameters (e.g. seismic, deformation, infrasonic and geochemical signals) are often not able to fully explain the ongoing phenomenon, thus making the fast volcano state assessment a very puzzling task for the personnel on duty at the control rooms. With the aim of aiding the personnel on duty in volcano surveillance, here we introduce a system based on an ensemble of data-driven classifiers to infer automatically the ongoing volcano status from all the available different kind of measurements. The system consists of a heterogeneous set of independent classifiers, each one built with its own data and algorithm. Each classifier gives an output about the volcanic status. The ensemble technique allows weighting the single classifier output to combine all the classifications into a single status that maximizes the performance. We tested the model on the Mt. Etna (Italy) case study by considering a long record of multivariate data from 2011 to 2015 and cross-validated it. Results indicate that the proposed model is effective and of great power for decision-making purposes.

Keywords: Bayesian networks, expert system, mount Etna, volcano monitoring

Procedia PDF Downloads 247
3739 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: association rules, rule-based classification, classification quality, validation

Procedia PDF Downloads 440
3738 Comparison of Linear Discriminant Analysis and Support Vector Machine Classifications for Electromyography Signals Acquired at Five Positions of Elbow Joint

Authors: Amna Khan, Zareena Kausar, Saad Malik

Abstract:

Bio Mechatronics has extended applications in the field of rehabilitation. It has been contributing since World War II in improving the applicability of prosthesis and assistive devices in real life scenarios. In this paper, classification accuracies have been compared for two classifiers against five positions of elbow. Electromyography (EMG) signals analysis have been acquired directly from skeletal muscles of human forearm for each of the three defined positions and at modified extreme positions of elbow flexion and extension using 8 electrode Myo armband sensor. Features were extracted from filtered EMG signals for each position. Performance of two classifiers, support vector machine (SVM) and linear discriminant analysis (LDA) has been compared by analyzing the classification accuracies. SVM illustrated classification accuracies between 90-96%, in contrast to 84-87% depicted by LDA for five defined positions of elbow keeping the number of samples and selected feature the same for both SVM and LDA.

Keywords: classification accuracies, electromyography, linear discriminant analysis (LDA), Myo armband sensor, support vector machine (SVM)

Procedia PDF Downloads 368
3737 Evaluation of Random Forest and Support Vector Machine Classification Performance for the Prediction of Early Multiple Sclerosis from Resting State FMRI Connectivity Data

Authors: V. Saccà, A. Sarica, F. Novellino, S. Barone, T. Tallarico, E. Filippelli, A. Granata, P. Valentino, A. Quattrone

Abstract:

The work aim was to evaluate how well Random Forest (RF) and Support Vector Machine (SVM) algorithms could support the early diagnosis of Multiple Sclerosis (MS) from resting-state functional connectivity data. In particular, we wanted to explore the ability in distinguishing between controls and patients of mean signals extracted from ICA components corresponding to 15 well-known networks. Eighteen patients with early-MS (mean-age 37.42±8.11, 9 females) were recruited according to McDonald and Polman, and matched for demographic variables with 19 healthy controls (mean-age 37.55±14.76, 10 females). MRI was acquired by a 3T scanner with 8-channel head coil: (a)whole-brain T1-weighted; (b)conventional T2-weighted; (c)resting-state functional MRI (rsFMRI), 200 volumes. Estimated total lesion load (ml) and number of lesions were calculated using LST-toolbox from the corrected T1 and FLAIR. All rsFMRIs were pre-processed using tools from the FMRIB's Software Library as follows: (1) discarding of the first 5 volumes to remove T1 equilibrium effects, (2) skull-stripping of images, (3) motion and slice-time correction, (4) denoising with high-pass temporal filter (128s), (5) spatial smoothing with a Gaussian kernel of FWHM 8mm. No statistical significant differences (t-test, p < 0.05) were found between the two groups in the mean Euclidian distance and the mean Euler angle. WM and CSF signal together with 6 motion parameters were regressed out from the time series. We applied an independent component analysis (ICA) with the GIFT-toolbox using the Infomax approach with number of components=21. Fifteen mean components were visually identified by two experts. The resulting z-score maps were thresholded and binarized to extract the mean signal of the 15 networks for each subject. Statistical and machine learning analysis were then conducted on this dataset composed of 37 rows (subjects) and 15 features (mean signal in the network) with R language. The dataset was randomly splitted into training (75%) and test sets and two different classifiers were trained: RF and RBF-SVM. We used the intrinsic feature selection of RF, based on the Gini index, and recursive feature elimination (rfe) for the SVM, to obtain a rank of the most predictive variables. Thus, we built two new classifiers only on the most important features and we evaluated the accuracies (with and without feature selection) on test-set. The classifiers, trained on all the features, showed very poor accuracies on training (RF:58.62%, SVM:65.52%) and test sets (RF:62.5%, SVM:50%). Interestingly, when feature selection by RF and rfe-SVM were performed, the most important variable was the sensori-motor network I in both cases. Indeed, with only this network, RF and SVM classifiers reached an accuracy of 87.5% on test-set. More interestingly, the only misclassified patient resulted to have the lowest value of lesion volume. We showed that, with two different classification algorithms and feature selection approaches, the best discriminant network between controls and early MS, was the sensori-motor I. Similar importance values were obtained for the sensori-motor II, cerebellum and working memory networks. These findings, in according to the early manifestation of motor/sensorial deficits in MS, could represent an encouraging step toward the translation to the clinical diagnosis and prognosis.

Keywords: feature selection, machine learning, multiple sclerosis, random forest, support vector machine

Procedia PDF Downloads 241
3736 Foot Recognition Using Deep Learning for Knee Rehabilitation

Authors: Rakkrit Duangsoithong, Jermphiphut Jaruenpunyasak, Alba Garcia

Abstract:

The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.

Keywords: foot recognition, deep learning, knee rehabilitation, convolutional neural network

Procedia PDF Downloads 163
3735 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms

Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary

Abstract:

Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.

Keywords: ADHD, autism, epilepsy, EEG, SVM

Procedia PDF Downloads 192
3734 Does sustainability disclosure improve analysts’ forecast accuracy Evidence from European banks

Authors: Albert Acheampong, Tamer Elshandidy

Abstract:

We investigate the extent to which sustainability disclosure from the narrative section of European banks’ annual reports improves analyst forecast accuracy. We capture sustainability disclosure using a machine learning approach and use forecast error to proxy analyst forecast accuracy. Our results suggest that sustainability disclosure significantly improves analyst forecast accuracy by reducing the forecast error. In a further analysis, we also find that the induction of Directive 2014/95/European Union (EU) is associated with increased disclosure content, which then reduces forecast error. Collectively, our results suggest that sustainability disclosure improves forecast accuracy, and the induction of the new EU directive strengthens this improvement. These results hold after several further and robustness analyses. Our findings have implications for market participants and policymakers.

Keywords: sustainability disclosure, machine learning, analyst forecast accuracy, forecast error, European banks, EU directive

Procedia PDF Downloads 78
3733 Contributing to Accuracy of Bid Cost Estimate in Construction Projects

Authors: Abdullah Alhomidan

Abstract:

This study is conducted to identify the main factors affecting accuracy of pretender cost estimate in building construction projects in Saudi Arabia from owners’ perspective. 44 factors affecting pretender cost estimate were identified through literature review and discussion with some construction experts. The results show that the top important factors affecting pretender cost estimate accuracy are: level of competitors in the tendering, material price changes, communications with suppliers, communications with client, and estimating method used.

Keywords: cost estimate, accuracy, pretender, estimating, bid estimate

Procedia PDF Downloads 557
3732 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 518
3731 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 461
3730 Satellite Image Classification Using Firefly Algorithm

Authors: Paramjit Kaur, Harish Kundra

Abstract:

In the recent years, swarm intelligence based firefly algorithm has become a great focus for the researchers to solve the real time optimization problems. Here, firefly algorithm is used for the application of satellite image classification. For experimentation, Alwar area is considered to multiple land features like vegetation, barren, hilly, residential and water surface. Alwar dataset is considered with seven band satellite images. Firefly Algorithm is based on the attraction of less bright fireflies towards more brightener one. For the evaluation of proposed concept accuracy assessment parameters are calculated using error matrix. With the help of Error matrix, parameters of Kappa Coefficient, Overall Accuracy and feature wise accuracy parameters of user’s accuracy & producer’s accuracy can be calculated. Overall results are compared with BBO, PSO, Hybrid FPAB/BBO, Hybrid ACO/SOFM and Hybrid ACO/BBO based on the kappa coefficient and overall accuracy parameters.

Keywords: image classification, firefly algorithm, satellite image classification, terrain classification

Procedia PDF Downloads 401
3729 Wireless Sensor Anomaly Detection Using Soft Computing

Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh

Abstract:

We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.

Keywords: IDS, Machine learning, WSN, ZigBee technology

Procedia PDF Downloads 544
3728 Visual Thing Recognition with Binary Scale-Invariant Feature Transform and Support Vector Machine Classifiers Using Color Information

Authors: Wei-Jong Yang, Wei-Hau Du, Pau-Choo Chang, Jar-Ferr Yang, Pi-Hsia Hung

Abstract:

The demands of smart visual thing recognition in various devices have been increased rapidly for daily smart production, living and learning systems in recent years. This paper proposed a visual thing recognition system, which combines binary scale-invariant feature transform (SIFT), bag of words model (BoW), and support vector machine (SVM) by using color information. Since the traditional SIFT features and SVM classifiers only use the gray information, color information is still an important feature for visual thing recognition. With color-based SIFT features and SVM, we can discard unreliable matching pairs and increase the robustness of matching tasks. The experimental results show that the proposed object recognition system with color-assistant SIFT SVM classifier achieves higher recognition rate than that with the traditional gray SIFT and SVM classification in various situations.

Keywords: color moments, visual thing recognition system, SIFT, color SIFT

Procedia PDF Downloads 471
3727 Brain Computer Interface Implementation for Affective Computing Sensing: Classifiers Comparison

Authors: Ramón Aparicio-García, Gustavo Juárez Gracia, Jesús Álvarez Cedillo

Abstract:

A research line of the computer science that involve the study of the Human-Computer Interaction (HCI), which search to recognize and interpret the user intent by the storage and the subsequent analysis of the electrical signals of the brain, for using them in the control of electronic devices. On the other hand, the affective computing research applies the human emotions in the HCI process helping to reduce the user frustration. This paper shows the results obtained during the hardware and software development of a Brain Computer Interface (BCI) capable of recognizing the human emotions through the association of the brain electrical activity patterns. The hardware involves the sensing stage and analogical-digital conversion. The interface software involves algorithms for pre-processing of the signal in time and frequency analysis and the classification of patterns associated with the electrical brain activity. The methods used for the analysis and classification of the signal have been tested separately, by using a database that is accessible to the public, besides to a comparison among classifiers in order to know the best performing.

Keywords: affective computing, interface, brain, intelligent interaction

Procedia PDF Downloads 390
3726 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 321
3725 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 172
3724 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 351
3723 Comparative Study of Accuracy of Land Cover/Land Use Mapping Using Medium Resolution Satellite Imagery: A Case Study

Authors: M. C. Paliwal, A. K. Jain, S. K. Katiyar

Abstract:

Classification of satellite imagery is very important for the assessment of its accuracy. In order to determine the accuracy of the classified image, usually the assumed-true data are derived from ground truth data using Global Positioning System. The data collected from satellite imagery and ground truth data is then compared to find out the accuracy of data and error matrices are prepared. Overall and individual accuracies are calculated using different methods. The study illustrates advanced classification and accuracy assessment of land use/land cover mapping using satellite imagery. IRS-1C-LISS IV data were used for classification of satellite imagery. The satellite image was classified using the software in fourteen classes namely water bodies, agricultural fields, forest land, urban settlement, barren land and unclassified area etc. Classification of satellite imagery and calculation of accuracy was done by using ERDAS-Imagine software to find out the best method. This study is based on the data collected for Bhopal city boundaries of Madhya Pradesh State of India.

Keywords: resolution, accuracy assessment, land use mapping, satellite imagery, ground truth data, error matrices

Procedia PDF Downloads 508
3722 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 183
3721 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 144
3720 Comparison of Artificial Neural Networks and Statistical Classifiers in Olive Sorting Using Near-Infrared Spectroscopy

Authors: İsmail Kavdır, M. Burak Büyükcan, Ferhat Kurtulmuş

Abstract:

Table olive is a valuable product especially in Mediterranean countries. It is usually consumed after some fermentation process. Defects happened naturally or as a result of an impact while olives are still fresh may become more distinct after processing period. Defected olives are not desired both in table olive and olive oil industries as it will affect the final product quality and reduce market prices considerably. Therefore it is critical to sort table olives before processing or even after processing according to their quality and surface defects. However, doing manual sorting has many drawbacks such as high expenses, subjectivity, tediousness and inconsistency. Quality criterions for green olives were accepted as color and free of mechanical defects, wrinkling, surface blemishes and rotting. In this study, it was aimed to classify fresh table olives using different classifiers and NIR spectroscopy readings and also to compare the classifiers. For this purpose, green (Ayvalik variety) olives were classified based on their surface feature properties such as defect-free, with bruised defect and with fly defect using FT-NIR spectroscopy and classification algorithms such as artificial neural networks, ident and cluster. Bruker multi-purpose analyzer (MPA) FT-NIR spectrometer (Bruker Optik, GmbH, Ettlingen Germany) was used for spectral measurements. The spectrometer was equipped with InGaAs detectors (TE-InGaAs internal for reflectance and RT-InGaAs external for transmittance) and a 20-watt high intensity tungsten–halogen NIR light source. Reflectance measurements were performed with a fiber optic probe (type IN 261) which covered the wavelengths between 780–2500 nm, while transmittance measurements were performed between 800 and 1725 nm. Thirty-two scans were acquired for each reflectance spectrum in about 15.32 s while 128 scans were obtained for transmittance in about 62 s. Resolution was 8 cm⁻¹ for both spectral measurement modes. Instrument control was done using OPUS software (Bruker Optik, GmbH, Ettlingen Germany). Classification applications were performed using three classifiers; Backpropagation Neural Networks, ident and cluster classification algorithms. For these classification applications, Neural Network tool box in Matlab, ident and cluster modules in OPUS software were used. Classifications were performed considering different scenarios; two quality conditions at once (good vs bruised, good vs fly defect) and three quality conditions at once (good, bruised and fly defect). Two spectrometer readings were used in classification applications; reflectance and transmittance. Classification results obtained using artificial neural networks algorithm in discriminating good olives from bruised olives, from olives with fly defect and from the olive group including both bruised and fly defected olives with success rates respectively changing between 97 and 99%, 61 and 94% and between 58.67 and 92%. On the other hand, classification results obtained for discriminating good olives from bruised ones and also for discriminating good olives from fly defected olives using the ident method ranged between 75-97.5% and 32.5-57.5%, respectfully; results obtained for the same classification applications using the cluster method ranged between 52.5-97.5% and between 22.5-57.5%.

Keywords: artificial neural networks, statistical classifiers, NIR spectroscopy, reflectance, transmittance

Procedia PDF Downloads 248
3719 Gait Biometric for Person Re-Identification

Authors: Lavanya Srinivasan

Abstract:

Biometric identification is to identify unique features in a person like fingerprints, iris, ear, and voice recognition that need the subject's permission and physical contact. Gait biometric is used to identify the unique gait of the person by extracting moving features. The main advantage of gait biometric to identify the gait of a person at a distance, without any physical contact. In this work, the gait biometric is used for person re-identification. The person walking naturally compared with the same person walking with bag, coat, and case recorded using longwave infrared, short wave infrared, medium wave infrared, and visible cameras. The videos are recorded in rural and in urban environments. The pre-processing technique includes human identified using YOLO, background subtraction, silhouettes extraction, and synthesis Gait Entropy Image by averaging the silhouettes. The moving features are extracted from the Gait Entropy Energy Image. The extracted features are dimensionality reduced by the principal component analysis and recognised using different classifiers. The comparative results with the different classifier show that linear discriminant analysis outperforms other classifiers with 95.8% for visible in the rural dataset and 94.8% for longwave infrared in the urban dataset.

Keywords: biometric, gait, silhouettes, YOLO

Procedia PDF Downloads 173
3718 Comparing Deep Architectures for Selecting Optimal Machine Translation

Authors: Despoina Mouratidis, Katia Lida Kermanidis

Abstract:

Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.

Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification

Procedia PDF Downloads 133
3717 Multilabel Classification with Neural Network Ensemble Method

Authors: Sezin Ekşioğlu

Abstract:

Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.

Keywords: multilabel, classification, neural network, KNN

Procedia PDF Downloads 155
3716 Ribotaxa: Combined Approaches for Taxonomic Resolution Down to the Species Level from Metagenomics Data Revealing Novelties

Authors: Oshma Chakoory, Sophie Comtet-Marre, Pierre Peyret

Abstract:

Metagenomic classifiers are widely used for the taxonomic profiling of metagenomic data and estimation of taxa relative abundance. Small subunit rRNA genes are nowadays a gold standard for the phylogenetic resolution of complex microbial communities, although the power of this marker comes down to its use as full-length. We benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then built a pipeline called RiboTaxa to generate a highly sensitive and specific metataxonomic approach. Using metagenomics data, RiboTaxa gave the best results compared to other tools (Kraken2, Centrifuge (1), METAXA2 (2), PhyloFlash (3)) with precise taxonomic identification and relative abundance description, giving no false positive detection. Using real datasets from various environments (ocean, soil, human gut) and from different approaches (metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not seen by current bioinformatics analysis opening new biological perspectives in human and environmental health. In a study focused on corals’ health involving 20 metagenomic samples (4), an affiliation of prokaryotes was limited to the family level with Endozoicomonadaceae characterising healthy octocoral tissue. RiboTaxa highlighted 2 species of uncultured Endozoicomonas which were dominant in the healthy tissue. Both species belonged to a genus not yet described, opening new research perspectives on corals’ health. Applied to metagenomics data from a study on human gut and extreme longevity (5), RiboTaxa detected the presence of an uncultured archaeon in semi-supercentenarians (aged 105 to 109 years) highlighting an archaeal genus, not yet described, and 3 uncultured species belonging to the Enorma genus that could be species of interest participating in the longevity process. RiboTaxa is user-friendly, rapid, allowing microbiota structure description from any environment and the results can be easily interpreted. This software is freely available at https://github.com/oschakoory/RiboTaxa under the GNU Affero General Public License 3.0.

Keywords: metagenomics profiling, microbial diversity, SSU rRNA genes, full-length phylogenetic marker

Procedia PDF Downloads 123
3715 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 217
3714 A Comparative Study of k-NN and MLP-NN Classifiers Using GA-kNN Based Feature Selection Method for Wood Recognition System

Authors: Uswah Khairuddin, Rubiyah Yusof, Nenny Ruthfalydia Rosli

Abstract:

This paper presents a comparative study between k-Nearest Neighbour (k-NN) and Multi-Layer Perceptron Neural Network (MLP-NN) classifier using Genetic Algorithm (GA) as feature selector for wood recognition system. The features have been extracted from the images using Grey Level Co-Occurrence Matrix (GLCM). The use of GA based feature selection is mainly to ensure that the database used for training the features for the wood species pattern classifier consists of only optimized features. The feature selection process is aimed at selecting only the most discriminating features of the wood species to reduce the confusion for the pattern classifier. This feature selection approach maintains the ‘good’ features that minimizes the inter-class distance and maximizes the intra-class distance. Wrapper GA is used with k-NN classifier as fitness evaluator (GA-kNN). The results shows that k-NN is the best choice of classifier because it uses a very simple distance calculation algorithm and classification tasks can be done in a short time with good classification accuracy.

Keywords: feature selection, genetic algorithm, optimization, wood recognition system

Procedia PDF Downloads 545
3713 Deep Reinforcement Learning and Generative Adversarial Networks Approach to Thwart Intrusions and Adversarial Attacks

Authors: Fabrice Setephin Atedjio, Jean-Pierre Lienou, Frederica F. Nelson, Sachin S. Shetty, Charles A. Kamhoua

Abstract:

Malicious users exploit vulnerabilities in computer systems, significantly disrupting their performance and revealing the inadequacies of existing protective solutions. Even machine learning-based approaches, designed to ensure reliability, can be compromised by adversarial attacks that undermine their robustness. This paper addresses two critical aspects of enhancing model reliability. First, we focus on improving model performance and robustness against adversarial threats. To achieve this, we propose a strategy by harnessing deep reinforcement learning. Second, we introduce an approach leveraging generative adversarial networks to counter adversarial attacks effectively. Our results demonstrate substantial improvements over previous works in the literature, with classifiers exhibiting enhanced accuracy in classification tasks, even in the presence of adversarial perturbations. These findings underscore the efficacy of the proposed model in mitigating intrusions and adversarial attacks within the machine-learning landscape.

Keywords: machine learning, reliability, adversarial attacks, deep-reinforcement learning, robustness

Procedia PDF Downloads 14
3712 Social Media Data Analysis for Personality Modelling and Learning Styles Prediction Using Educational Data Mining

Authors: Srushti Patil, Preethi Baligar, Gopalkrishna Joshi, Gururaj N. Bhadri

Abstract:

In designing learning environments, the instructional strategies can be tailored to suit the learning style of an individual to ensure effective learning. In this study, the information shared on social media like Facebook is being used to predict learning style of a learner. Previous research studies have shown that Facebook data can be used to predict user personality. Users with a particular personality exhibit an inherent pattern in their digital footprint on Facebook. The proposed work aims to correlate the user's’ personality, predicted from Facebook data to the learning styles, predicted through questionnaires. For Millennial learners, Facebook has become a primary means for information sharing and interaction with peers. Thus, it can serve as a rich bed for research and direct the design of learning environments. The authors have conducted this study in an undergraduate freshman engineering course. Data from 320 freshmen Facebook users was collected. The same users also participated in the learning style and personality prediction survey. The Kolb’s Learning style questionnaires and Big 5 personality Inventory were adopted for the survey. The users have agreed to participate in this research and have signed individual consent forms. A specific page was created on Facebook to collect user data like personal details, status updates, comments, demographic characteristics and egocentric network parameters. This data was captured by an application created using Python program. The data captured from Facebook was subjected to text analysis process using the Linguistic Inquiry and Word Count dictionary. An analysis of the data collected from the questionnaires performed reveals individual student personality and learning style. The results obtained from analysis of Facebook, learning style and personality data were then fed into an automatic classifier that was trained by using the data mining techniques like Rule-based classifiers and Decision trees. This helps to predict the user personality and learning styles by analysing the common patterns. Rule-based classifiers applied for text analysis helps to categorize Facebook data into positive, negative and neutral. There were totally two models trained, one to predict the personality from Facebook data; another one to predict the learning styles from the personalities. The results show that the classifier model has high accuracy which makes the proposed method to be a reliable one for predicting the user personality and learning styles.

Keywords: educational data mining, Facebook, learning styles, personality traits

Procedia PDF Downloads 231