Search results for: classification and clustering.
879 1/Sigma Term Weighting Scheme for Sentiment Analysis
Authors: Hanan Alshaher, Jinsheng Xu
Abstract:
Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.
Keywords: Sentiment analysis, term weighting scheme, 1/sigma.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 535878 The Use of Symbolic Signs in Modern Ukrainian Monumental Church Painting: Classification and Hidden Semantics
Authors: Khlystun Yuliia Igorivna
Abstract:
Monumental church paintings are often perceived either as the interior decoration of the temple or as the "Gospel for the illiterate," as the temple painting often contains scenes from Holy Scripture. In science the painting of the Orthodox Church is mainly the subject of study of art critics, but from the point of view of culturology and semiotics, it is insufficiently studied. The symbolism of monumental church painting is insufficiently revealed. The aim of this paper is to give a description of symbolic signs, to classify them, to give examples for each type of sign from the paintings of modern temples of Eastern Ukraine, on the basis of semiotic analysis of iconographic plots used in monumental church painting. We offer own classification of symbols of monumental church painting, using examples from the murals of modern Orthodox churches in Eastern Ukraine, mainly from the Donetsk region. When analyzing the semantics of symbolic signs, the following methods of the culturological approach were used: semiotic, iconological, iconographic, hermeneutic, culturological, descriptive, comparative-historical, visual-analytical. When interpreting the meanings of symbolic signs, scientific, cultural and theological literature were used. Photos taken by the author have been added to the article.
Keywords: Iconography, painting of Orthodox Church, Orthodox Church, semiotic signs in modern iconography, classification of symbols in painting of Orthodox Church.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 399877 TheAnalyzer: Clustering-Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human-Computer Interaction
Authors: D. S. A. Nanayakkara, K. J. P. G. Perera
Abstract:
E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. TheAnalyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling TheAnalyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.
Keywords: Data clustering, data standardization, dimensionality reduction, human-computer interaction, user profiling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 229876 In Search of an SVD and QRcp Based Optimization Technique of ANN for Automatic Classification of Abnormal Heart Sounds
Authors: Samit Ari, Goutam Saha
Abstract:
Artificial Neural Network (ANN) has been extensively used for classification of heart sounds for its discriminative training ability and easy implementation. However, it suffers from overparameterization if the number of nodes is not chosen properly. In such cases, when the dataset has redundancy within it, ANN is trained along with this redundant information that results in poor validation. Also a larger network means more computational expense resulting more hardware and time related cost. Therefore, an optimum design of neural network is needed towards real-time detection of pathological patterns, if any from heart sound signal. The aims of this work are to (i) select a set of input features that are effective for identification of heart sound signals and (ii) make certain optimum selection of nodes in the hidden layer for a more effective ANN structure. Here, we present an optimization technique that involves Singular Value Decomposition (SVD) and QR factorization with column pivoting (QRcp) methodology to optimize empirically chosen over-parameterized ANN structure. Input nodes present in ANN structure is optimized by SVD followed by QRcp while only SVD is required to prune undesirable hidden nodes. The result is presented for classifying 12 common pathological cases and normal heart sound.Keywords: ANN, Classification of heart diseases, murmurs, optimization, Phonocardiogram, QRcp, SVD.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2071875 Tree Based Data Fusion Clustering Routing Algorithm for Illimitable Network Administration in Wireless Sensor Network
Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji
Abstract:
In wireless sensor networks, locality and positioning information can be captured using Global Positioning System (GPS). This message can be congregated initially from spot to identify the system. Users can retrieve information of interest from a wireless sensor network (WSN) by injecting queries and gathering results from the mobile sink nodes. Routing is the progression of choosing optimal path in a mobile network. Intermediate node employs permutation of device nodes into teams and generating cluster heads that gather the data from entity cluster’s node and encourage the collective data to base station. WSNs are widely used for gathering data. Since sensors are power-constrained devices, it is quite vital for them to reduce the power utilization. A tree-based data fusion clustering routing algorithm (TBDFC) is used to reduce energy consumption in wireless device networks. Here, the nodes in a tree use the cluster formation, whereas the elevation of the tree is decided based on the distance of the member nodes to the cluster-head. Network simulation shows that this scheme improves the power utilization by the nodes, and thus considerably improves the lifetime.
Keywords: WSN, TBDFC, LEACH, PEGASIS, TREEPSI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1116874 File Format of Flow Chart Simulation Software - CFlow
Authors: Syahanim Mohd Salleh, Zaihosnita Hood, Hairulliza Mohd Judi, Marini Abu Bakar
Abstract:
CFlow is a flow chart software, it contains facilities to draw and evaluate a flow chart. A flow chart evaluation applies a simulation method to enable presentation of work flow in a flow chart solution. Flow chart simulation of CFlow is executed by manipulating the CFlow data file which is saved in a graphical vector format. These text-based data are organised by using a data classification technic based on a Library classification-scheme. This paper describes the file format for flow chart simulation software of CFlow.Keywords: CFlow, flow chart, file format.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2553873 Improved Body Mass Index Classification for Football Code Masters Athletes, A Comparison to the Australian National Population
Authors: Joe Walsh, Mike Climstein, Ian Timothy Heazlewood, Stephen Burke, Jyrki Kettunen, Kent Adams, Mark DeBeliso
Abstract:
Thousands of masters athletes participate quadrennially in the World Masters Games (WMG), yet this cohort of athletes remains proportionately under-investigated. Due to a growing global obesity pandemic in context of benefits of physical activity across the lifespan, the prevalence of obesity in this unique population was of particular interest. Data gathered on a sub-sample of 535 football code athletes, aged 31-72 yrs ( =47.4, s =±7.1), competing at the Sydney World Masters Games (2009) demonstrated a significantly (p<0.001), reduced classification of obesity using Body Mass Index (BMI) when compared to data on the Australian national population. This evidence of improved classification in one index of health (BMI<30) implies there are either improved levels of this index of health due to adherence to sport or possibly the reduced BMI is advantageous and contributes to this cohort adhering (or being attracted) to masters sport. Given the worldwide focus on the obesity epidemic and the need for a multi-faceted solution to this problem, demonstration of these middle to older aged adults having improved BMI over the general population is of particular interest.Keywords: BMI, masters athlete, rugby union, soccer, touch football
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678872 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups
Authors: Lily Ingsrisawang, Tasanee Nacharoen
Abstract:
The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2008871 Two Class Motor Imagery Classification via Wave Atom Sub-Bants
Authors: Nebi Gedik
Abstract:
The goal of motor image brain computer interface research is to create a link between the central nervous system and a computer or device. The most important signal for brain-computer interface is the electroencephalogram. The aim of this research is to explore a set of effective features from EEG signals, separated into frequency bands, using wave atom sub-bands to discriminate right and left-hand motor imagery signals. Over the transform coefficients, feature vectors are constructed for each frequency range and each transform sub-band, and their classification performances are tested. The method is validated using EEG signals from the BCI competition III dataset IIIa and classifiers such as support vector machine and k-nearest neighbors.
Keywords: motor imagery, EEG, Wave atom transform sub-bands, SVM, k-NN
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 599870 Amelioration of Cardiac Arrythmias Classification Performance Using Artificial Neural Network, Adaptive Neuro-Fuzzy and Fuzzy Inference Systems Classifiers
Authors: Alexandre Boum, Salomon Madinatou
Abstract:
This paper aims at bringing a scientific contribution to the cardiac arrhythmia biomedical diagnosis systems; more precisely to the study of the amelioration of cardiac arrhythmia classification performance using artificial neural network, adaptive neuro-fuzzy and fuzzy inference systems classifiers. The purpose of this amelioration is to enable cardiologists to make reliable diagnosis through automatic cardiac arrhythmia analyzes and classifications based on high confidence classifiers. In this study, six classes of the most commonly encountered arrhythmias are considered: the Right Bundle Branch Block, the Left Bundle Branch Block, the Ventricular Extrasystole, the Auricular Extrasystole, the Atrial Fibrillation and the Normal Cardiac rate beat. From the electrocardiogram (ECG) extracted parameters, we constructed a matrix (360x360) serving as an input data sample for the classifiers based on neural networks and a matrix (1x6) for the classifier based on fuzzy logic. By varying three parameters (the quality of the neural network learning, the data size and the quality of the input parameters) the automatic classification permitted us to obtain the following performances: in terms of correct classification rate, 83.6% was obtained using the fuzzy logic based classifier, 99.7% using the neural network based classifier and 99.8% for the adaptive neuro-fuzzy based classifier. These results are based on signals containing at least 360 cardiac cycles. Based on the comparative analysis of the aforementioned three arrhythmia classifiers, the classifiers based on neural networks exhibit a better performance.
Keywords: Adaptive neuro-fuzzy, artificial neural network, cardiac arrythmias, fuzzy inference systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 710869 Prediction of Writer Using Tamil Handwritten Document Image Based on Pooled Features
Authors: T. Thendral, M. S. Vijaya, S. Karpagavalli
Abstract:
Tamil handwritten document is taken as a key source of data to identify the writer. Tamil is a classical language which has 247 characters include compound characters, consonants, vowels and special character. Most characters of Tamil are multifaceted in nature. Handwriting is a unique feature of an individual. Writer may change their handwritings according to their frame of mind and this place a risky challenge in identifying the writer. A new discriminative model with pooled features of handwriting is proposed and implemented using support vector machine. It has been reported on 100% of prediction accuracy by RBF and polynomial kernel based classification model.
Keywords: Classification, Feature extraction, Support vector machine, Training, Writer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2312868 Prediction of Writer Using Tamil Handwritten Document Image Based on Pooled Features
Authors: T. Thendral, M. S. Vijaya, S. Karpagavalli
Abstract:
Tamil handwritten document is taken as a key source of data to identify the writer. Tamil is a classical language which has 247 characters include compound characters, consonants, vowels and special character. Most characters of Tamil are multifaceted in nature. Handwriting is a unique feature of an individual. Writer may change their handwritings according to their frame of mind and this place a risky challenge in identifying the writer. A new discriminative model with pooled features of handwriting is proposed and implemented using support vector machine. It has been reported on 100% of prediction accuracy by RBF and polynomial kernel based classification model.Keywords: Classification, Feature extraction, Support vector machine, Training, Writer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1701867 Determining Senses for Word Sense Disambiguation in Turkish
Authors: Zeynep Orhan, Zeynep Altan
Abstract:
Word sense disambiguation is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that specific word. This paper deals with the methodologies of determining the senses for a given word if they can not be obtained from an already available resource like WordNet. We offer a method that helps us to determine the sense boundaries gradually. In this method, first we decide on some features that are thought to be effective on the senses and divide the instances first into two, then according to the results of evaluations we continue dividing instances gradually. In a second method we use the pseudo words. We devise artificial words depending on some criteria and evaluate classification algorithms on these previously classified words.
Keywords: Word sense disambiguation, sense determination, pseudo words, sense granularity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410866 Hybrid Structure Learning Approach for Assessing the Phosphate Laundries Impact
Authors: Emna Benmohamed, Hela Ltifi, Mounir Ben Ayed
Abstract:
Bayesian Network (BN) is one of the most efficient classification methods. It is widely used in several fields (i.e., medical diagnostics, risk analysis, bioinformatics research). The BN is defined as a probabilistic graphical model that represents a formalism for reasoning under uncertainty. This classification method has a high-performance rate in the extraction of new knowledge from data. The construction of this model consists of two phases for structure learning and parameter learning. For solving this problem, the K2 algorithm is one of the representative data-driven algorithms, which is based on score and search approach. In addition, the integration of the expert's knowledge in the structure learning process allows the obtainment of the highest accuracy. In this paper, we propose a hybrid approach combining the improvement of the K2 algorithm called K2 algorithm for Parents and Children search (K2PC) and the expert-driven method for learning the structure of BN. The evaluation of the experimental results, using the well-known benchmarks, proves that our K2PC algorithm has better performance in terms of correct structure detection. The real application of our model shows its efficiency in the analysis of the phosphate laundry effluents' impact on the watershed in the Gafsa area (southwestern Tunisia).
Keywords: Classification, Bayesian network; structure learning, K2 algorithm, expert knowledge, surface water analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 513865 Extraction of Significant Phrases from Text
Authors: Yuan J. Lui
Abstract:
Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.
Keywords: classification, keyphrase extraction, machine learning, summarization
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2051864 Energy Detection Based Sensing and Primary User Traffic Classification for Cognitive Radio
Authors: Urvee B. Trivedi, U. D. Dalal
Abstract:
As wireless communication services grow quickly; the seriousness of spectrum utilization has been on the rise gradually. An emerging technology, cognitive radio has come out to solve today’s spectrum scarcity problem. To support the spectrum reuse functionality, secondary users are required to sense the radio frequency environment, and once the primary users are found to be active, the secondary users are required to vacate the channel within a certain amount of time. Therefore, spectrum sensing is of significant importance. Once sensing is done, different prediction rules apply to classify the traffic pattern of primary user. Primary user follows two types of traffic patterns: periodic and stochastic ON-OFF patterns. A cognitive radio can learn the patterns in different channels over time. Two types of classification methods are discussed in this paper, by considering edge detection and by using autocorrelation function. Edge detection method has a high accuracy but it cannot tolerate sensing errors. Autocorrelation-based classification is applicable in the real environment as it can tolerate some amount of sensing errors.Keywords: Cognitive radio (CR), probability of detection (PD), probability of false alarm (PF), primary User (PU), secondary user (SU), Fast Fourier transform (FFT), signal to noise ratio (SNR).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1470863 Calcification Classification in Mammograms Using Decision Trees
Authors: S. Usha, S. Arumugam
Abstract:
Cancer affects people globally with breast cancer being a leading killer. Breast cancer is due to the uncontrollable multiplication of cells resulting in a tumour or neoplasm. Tumours are called ‘benign’ when cancerous cells do not ravage other body tissues and ‘malignant’ if they do so. As mammography is an effective breast cancer detection tool at an early stage which is the most treatable stage it is the primary imaging modality for screening and diagnosis of this cancer type. This paper presents an automatic mammogram classification technique using wavelet and Gabor filter. Correlation feature selection is used to reduce the feature set and selected features are classified using different decision trees.
Keywords: Breast Cancer, Mammogram, Symlet Wavelets, Gabor Filters, Decision Trees
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751862 Classifier Based Text Mining for Neural Network
Authors: M. Govindarajan, R. M. Chandrasekaran
Abstract:
Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.Keywords: Back propagation, classification accuracy, textmining, time complexity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4218861 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
Authors: Ruchika Malhotra, Megha Khanna
Abstract:
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.Keywords: Change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520860 Predictive Analytics of Student Performance Determinants in Education
Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi
Abstract:
Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.
Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 548859 Sentiment Analysis of Fake Health News Using Naive Bayes Classification Models
Authors: Danielle Shackley, Yetunde Folajimi
Abstract:
As more people turn to the internet seeking health related information, there is more risk of finding false, inaccurate, or dangerous information. Sentiment analysis is a natural language processing technique that assigns polarity scores of text, ranging from positive, neutral and negative. In this research, we evaluate the weight of a sentiment analysis feature added to fake health news classification models. The dataset consists of existing reliably labeled health article headlines that were supplemented with health information collected about COVID-19 from social media sources. We started with data preprocessing, tested out various vectorization methods such as Count and TFIDF vectorization. We implemented 3 Naive Bayes classifier models, including Bernoulli, Multinomial and Complement. To test the weight of the sentiment analysis feature on the dataset, we created benchmark Naive Bayes classification models without sentiment analysis, and those same models were reproduced and the feature was added. We evaluated using the precision and accuracy scores. The Bernoulli initial model performed with 90% precision and 75.2% accuracy, while the model supplemented with sentiment labels performed with 90.4% precision and stayed constant at 75.2% accuracy. Our results show that the addition of sentiment analysis did not improve model precision by a wide margin; while there was no evidence of improvement in accuracy, we had a 1.9% improvement margin of the precision score with the Complement model. Future expansion of this work could include replicating the experiment process, and substituting the Naive Bayes for a deep learning neural network model.
Keywords: Sentiment analysis, Naive Bayes model, natural language processing, topic analysis, fake health news classification model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489858 Classification of Defects by the SVM Method and the Principal Component Analysis (PCA)
Authors: M. Khelil, M. Boudraa, A. Kechida, R. Drai
Abstract:
Analyses carried out on examples of detected defects echoes showed clearly that one can describe these detected forms according to a whole of characteristic parameters in order to be able to make discrimination between a planar defect and a volumic defect. This work answers to a problem of ultrasonics NDT like Identification of the defects. The problems as well as the objective of this realized work, are divided in three parts: Extractions of the parameters of wavelets from the ultrasonic echo of the detected defect - the second part is devoted to principal components analysis (PCA) for optimization of the attributes vector. And finally to establish the algorithm of classification (SVM, Support Vector Machine) which allows discrimination between a plane defect and a volumic defect. We have completed this work by a conclusion where we draw up a summary of the completed works, as well as the robustness of the various algorithms proposed in this study.Keywords: NDT, PCA, SVM, ultrasonics, wavelet
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2003857 A New Approach for Classifying Large Number of Mixed Variables
Authors: Hashibah Hamid
Abstract:
The issue of classifying objects into one of predefined groups when the measured variables are mixed with different types of variables has been part of interest among statisticians in many years. Some methods for dealing with such situation have been introduced that include parametric, semi-parametric and nonparametric approaches. This paper attempts to discuss on a problem in classifying a data when the number of measured mixed variables is larger than the size of the sample. A propose idea that integrates a dimensionality reduction technique via principal component analysis and a discriminant function based on the location model is discussed. The study aims in offering practitioners another potential tool in a classification problem that is possible to be considered when the observed variables are mixed and too large.Keywords: classification, location model, mixed variables, principal component analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557856 Gradual Shot Boundary Detection and Classification Based on Fractal Analysis
Authors: Zeinab Zeinalpour-Tabrizi, Faeze Asdaghi, Mahmooh Fathy, Mohammad Reza Jahed-Motlagh
Abstract:
Shot boundary detection is a fundamental step for the organization of large video data. In this paper, we propose a new method for video gradual shots detection and classification, using advantages of fractal analysis and AIS-based classifier. Proposed features are “vertical intercept" and “fractal dimension" of each frame of videos which are computed using Fourier transform coefficients. We also used a classifier based on Clonal Selection Algorithm. We have carried out our solution and assessed it according to the TRECVID2006 benchmark dataset.
Keywords: shot boundary detection, gradual shots, fractal analysis, artificial immune system, choose Clooney.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924855 A Fuzzy Classifier with Evolutionary Design of Ellipsoidal Decision Regions
Authors: Leehter Yao, Kuei-Song Weng, Cherng-Dir Huang
Abstract:
A fuzzy classifier using multiple ellipsoids approximating decision regions for classification is to be designed in this paper. An algorithm called Gustafson-Kessel algorithm (GKA) with an adaptive distance norm based on covariance matrices of prototype data points is adopted to learn the ellipsoids. GKA is able toadapt the distance norm to the underlying distribution of the prototypedata points except that the sizes of ellipsoids need to be determined a priori. To overcome GKA's inability to determine appropriate size ofellipsoid, the genetic algorithm (GA) is applied to learn the size ofellipsoid. With GA combined with GKA, it will be shown in this paper that the proposed method outperforms the benchmark algorithms as well as algorithms in the field.
Keywords: Ellipsoids, genetic algorithm, classification, fuzzyc-means (FCM)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693854 Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor
Authors: C. Gunavathi, K. Premalatha
Abstract:
Microarray technology is universally used in the study of disease diagnosis using gene expression levels. The main shortcoming of gene expression data is that it includes thousands of genes and a small number of samples. Abundant methods and techniques have been proposed for tumor classification using microarray gene expression data. Feature or gene selection methods can be used to mine the genes that directly involve in the classification and to eliminate irrelevant genes. In this paper statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR) and F-Statistics are used to rank the genes. The ranked genes are used for further classification. Particle Swarm Optimization (PSO) algorithm and Shuffled Frog Leaping (SFL) algorithm are used to find the significant genes from the top-m ranked genes. The Naïve Bayes Classifier (NBC) is used to classify the samples based on the significant genes. The proposed work is applied on Lung and Ovarian datasets. The experimental results show that the proposed method achieves 100% accuracy in all the three datasets and the results are compared with previous works.
Keywords: Microarray, T-Statistics, Signal-to-Noise Ratio, FStatistics, Particle Swarm Optimization, Shuffled Frog Leaping, Naïve Bayes Classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945853 Multi-Layer Perceptron and Radial Basis Function Neural Network Models for Classification of Diabetic Retinopathy Disease Using Video-Oculography Signals
Authors: Ceren Kaya, Okan Erkaymaz, Orhan Ayar, Mahmut Özer
Abstract:
Diabetes Mellitus (Diabetes) is a disease based on insulin hormone disorders and causes high blood glucose. Clinical findings determine that diabetes can be diagnosed by electrophysiological signals obtained from the vital organs. 'Diabetic Retinopathy' is one of the most common eye diseases resulting on diabetes and it is the leading cause of vision loss due to structural alteration of the retinal layer vessels. In this study, features of horizontal and vertical Video-Oculography (VOG) signals have been used to classify non-proliferative and proliferative diabetic retinopathy disease. Twenty-five features are acquired by using discrete wavelet transform with VOG signals which are taken from 21 subjects. Two models, based on multi-layer perceptron and radial basis function, are recommended in the diagnosis of Diabetic Retinopathy. The proposed models also can detect level of the disease. We show comparative classification performance of the proposed models. Our results show that proposed the RBF model (100%) results in better classification performance than the MLP model (94%).
Keywords: Diabetic retinopathy, discrete wavelet transform, multi-layer perceptron, radial basis function, video-oculography.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1346852 Visualization and Indexing of Spectral Databases
Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi
Abstract:
On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1769851 Curvelet Features with Mouth and Face Edge Ratios for Facial Expression Identification
Authors: S. Kherchaoui, A. Houacine
Abstract:
This paper presents a facial expression recognition system. It performs identification and classification of the seven basic expressions; happy, surprise, fear, disgust, sadness, anger, and neutral states. It consists of three main parts. The first one is the detection of a face and the corresponding facial features to extract the most expressive portion of the face, followed by a normalization of the region of interest. Then calculus of curvelet coefficients is performed with dimensionality reduction through principal component analysis. The resulting coefficients are combined with two ratios; mouth ratio and face edge ratio to constitute the whole feature vector. The third step is the classification of the emotional state using the SVM method in the feature space.
Keywords: Facial expression identification, curvelet coefficients, support vector machine (SVM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842850 6D Posture Estimation of Road Vehicles from Color Images
Authors: Yoshimoto Kurihara, Tad Gonsalves
Abstract:
Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.
Keywords: AlexNet, Deep learning, image recognition, 6D posture estimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 590