Search results for: short text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6213

Search results for: short text classification

6153 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 218
6152 Sensitive Analysis of the ZF Model for ABC Multi Criteria Inventory Classification

Authors: Makram Ben Jeddou

Abstract:

The ABC classification is widely used by managers for inventory control. The classical ABC classification is based on the Pareto principle and according to the criterion of the annual use value only. Single criterion classification is often insufficient for a closely inventory control. Multi-criteria inventory classification models have been proposed by researchers in order to take into account other important criteria. From these models, we will consider the ZF model in order to make a sensitive analysis on the composite score calculated for each item. In fact, this score based on a normalized average between a good and a bad optimized index can affect the ABC items classification. We will then focus on the weights assigned to each index and propose a classification compromise.

Keywords: ABC classification, multi criteria inventory classification models, ZF-model

Procedia PDF Downloads 508
6151 Improved Classification Procedure for Imbalanced and Overlapped Situations

Authors: Hankyu Lee, Seoung Bum Kim

Abstract:

The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.

Keywords: classification, imbalanced data with class overlap, split data space, support vector machine

Procedia PDF Downloads 308
6150 A Method for Compression of Short Unicode Strings

Authors: Masoud Abedi, Abbas Malekpour, Peter Luksch, Mohammad Reza Mojtabaei

Abstract:

The use of short texts in communication has been greatly increasing in recent years. Applying different languages in short texts has led to compulsory use of Unicode strings. These strings need twice the space of common strings, hence, applying algorithms of compression for the purpose of accelerating transmission and reducing cost is worthwhile. Nevertheless, other compression methods like gzip, bzip2 or PAQ due to high overhead data size are not appropriate. The Huffman algorithm is one of the rare algorithms effective in reducing the size of short Unicode strings. In this paper, an algorithm is proposed for compression of very short Unicode strings. At first, every new character to be sent to a destination is inserted in the proposed mapping table. At the beginning, every character is new. In case the character is repeated for the same destination, it is not considered as a new character. Next, the new characters together with the mapping value of repeated characters are arranged through a specific technique and specially formatted to be transmitted. The results obtained from an assessment made on a set of short Persian and Arabic strings indicate that this proposed algorithm outperforms the Huffman algorithm in size reduction.

Keywords: Algorithms, Data Compression, Decoding, Encoding, Huffman Codes, Text Communication

Procedia PDF Downloads 349
6149 1/Sigma Term Weighting Scheme for Sentiment Analysis

Authors: Hanan Alshaher, Jinsheng Xu

Abstract:

Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.

Keywords: 1/sigma, natural language processing, sentiment analysis, term weighting scheme, text classification

Procedia PDF Downloads 205
6148 Encryption and Decryption of Nucleic Acid Using Deoxyribonucleic Acid Algorithm

Authors: Iftikhar A. Tayubi, Aabdulrahman Alsubhi, Abdullah Althrwi

Abstract:

The deoxyribonucleic acid text provides a single source of high-quality Cryptography about Deoxyribonucleic acid sequence for structural biologists. We will provide an intuitive, well-organized and user-friendly web interface that allows users to encrypt and decrypt Deoxy Ribonucleic Acid sequence text. It includes complex, securing by using Algorithm to encrypt and decrypt Deoxy Ribonucleic Acid sequence. The utility of this Deoxy Ribonucleic Acid Sequence Text is that, it can provide a user-friendly interface for users to Encrypt and Decrypt store the information about Deoxy Ribonucleic Acid sequence. These interfaces created in this project will satisfy the demands of the scientific community by providing fully encrypt of Deoxy Ribonucleic Acid sequence during this website. We have adopted a methodology by using C# and Active Server Page.NET for programming which is smart and secure. Deoxy Ribonucleic Acid sequence text is a wonderful piece of equipment for encrypting large quantities of data, efficiently. The users can thus navigate from one encoding and store orange text, depending on the field for user’s interest. Algorithm classification allows a user to Protect the deoxy ribonucleic acid sequence from change, whether an alteration or error occurred during the Deoxy Ribonucleic Acid sequence data transfer. It will check the integrity of the Deoxy Ribonucleic Acid sequence data during the access.

Keywords: algorithm, ASP.NET, DNA, encrypt, decrypt

Procedia PDF Downloads 235
6147 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis

Procedia PDF Downloads 715
6146 Mask-Prompt-Rerank: An Unsupervised Method for Text Sentiment Transfer

Authors: Yufen Qin

Abstract:

Text sentiment transfer is an important branch of text style transfer. The goal is to generate text with another sentiment attribute based on a text with a specific sentiment attribute while maintaining the content and semantic information unrelated to sentiment unchanged in the process. There are currently two main challenges in this field: no parallel corpus and text attribute entanglement. In response to the above problems, this paper proposed a novel solution: Mask-Prompt-Rerank. Use the method of masking the sentiment words and then using prompt regeneration to transfer the sentence sentiment. Experiments on two sentiment benchmark datasets and one formality transfer benchmark dataset show that this approach makes the performance of small pre-trained language models comparable to that of the most advanced large models, while consuming two orders of magnitude less computing and memory.

Keywords: language model, natural language processing, prompt, text sentiment transfer

Procedia PDF Downloads 82
6145 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech

Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori

Abstract:

Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.

Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing

Procedia PDF Downloads 133
6144 Emotion Detection in Twitter Messages Using Combination of Long Short-Term Memory and Convolutional Deep Neural Networks

Authors: Bahareh Golchin, Nooshin Riahi

Abstract:

One of the most significant issues as attended a lot in recent years is that of recognizing the sentiments and emotions in social media texts. The analysis of sentiments and emotions is intended to recognize the conceptual information such as the opinions, feelings, attitudes and emotions of people towards the products, services, organizations, people, topics, events and features in the written text. These indicate the greatness of the problem space. In the real world, businesses and organizations are always looking for tools to gather ideas, emotions, and directions of people about their products, services, or events related to their own. This article uses the Twitter social network, one of the most popular social networks with about 420 million active users, to extract data. Using this social network, users can share their information and opinions about personal issues, policies, products, events, etc. It can be used with appropriate classification of emotional states due to the availability of its data. In this study, supervised learning and deep neural network algorithms are used to classify the emotional states of Twitter users. The use of deep learning methods to increase the learning capacity of the model is an advantage due to the large amount of available data. Tweets collected on various topics are classified into four classes using a combination of two Bidirectional Long Short Term Memory network and a Convolutional network. The results obtained from this study with an average accuracy of 93%, show good results extracted from the proposed framework and improved accuracy compared to previous work.

Keywords: emotion classification, sentiment analysis, social networks, deep neural networks

Procedia PDF Downloads 139
6143 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar

Abstract:

The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 446
6142 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched the direction to the digital world. The domain of politics as one of the hottest topics of opinion mining research merged together with the behavior analysis for affiliation determination in text which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 are constituted by Linguistic Inquiry and Word Count (LIWC) features are tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that Decision Tree, Rule Induction and M5 Rule classifiers when used with SVM and IGR feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “function” as an aggregate feature of the linguistic category, is obtained as the most differentiating feature among the 68 features with 81% accuracy by itself in classifying articles either as Republican or Democrat.

Keywords: feature selection, LIWC, machine learning, politics

Procedia PDF Downloads 383
6141 The Arabic Literary Text, between Proficiency and Pedagogy

Authors: Abdul Rahman M. Chamseddine, Mahmoud El-ashiri

Abstract:

In the field of language teaching, communication skills are essential for the learner to achieve, however, these skills, in general, might not support the comprehension of some texts of literary or artistic nature like poetry. Understanding sentences and expressions is not enough to understand a poem; other skills are needed in order to understand the special structure of a text which literary meaning is inapprehensible even when the lingual meaning is well comprehended. And then there is the need for many other components that surpass one text to other similar texts that can be understood through solid traditions, which do not form an obstacle in the face of change and progress. This is not exclusive to texts that are classified as a literary but it is also the same with some daily short phrases and indicatively charged expressions that can be classified as literary or bear a taste of literary nature.. it can be found in Newpapers’ titles, TV news reports, and maybe football commentaries… the need to understand this special lingual use – described as literary – is highly important to understand this discourse that can be generally classified as very far from literature. This work will try to explore the role of the literary text in the language class and the way it is being covered or dealt with throughout all levels of acquiring proficiency. It will also attempt to survery the position of the literary text in some of the most important books for teaching Arabic around the world. The same way grammar is needed to understand the language, another (literary) grammar is also needed for understanding literature.

Keywords: language teaching, Arabic, literature, pedagogy, language proficiency

Procedia PDF Downloads 272
6140 Classification of Attacks Over Cloud Environment

Authors: Karim Abouelmehdi, Loubna Dali, Elmoutaoukkil Abdelmajid, Hoda Elsayed, Eladnani Fatiha, Benihssane Abderahim

Abstract:

The security of cloud services is the concern of cloud service providers. In this paper, we will mention different classifications of cloud attacks referred by specialized organizations. Each agency has its classification of well-defined properties. The purpose is to present a high-level classification of current research in cloud computing security. This classification is organized around attack strategies and corresponding defenses.

Keywords: cloud computing, classification, risk, security

Procedia PDF Downloads 548
6139 Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao

Abstract:

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Keywords: artificial intelligence and office, NLP, deep learning, text classification

Procedia PDF Downloads 202
6138 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’

Authors: Nadya Inda Syartanti

Abstract:

This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.

Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe

Procedia PDF Downloads 162
6137 Performance Analysis with the Combination of Visualization and Classification Technique for Medical Chatbot

Authors: Shajida M., Sakthiyadharshini N. P., Kamalesh S., Aswitha B.

Abstract:

Natural Language Processing (NLP) continues to play a strategic part in complaint discovery and medicine discovery during the current epidemic. This abstract provides an overview of performance analysis with a combination of visualization and classification techniques of NLP for a medical chatbot. Sentiment analysis is an important aspect of NLP that is used to determine the emotional tone behind a piece of text. This technique has been applied to various domains, including medical chatbots. In this, we have compared the combination of the decision tree with heatmap and Naïve Bayes with Word Cloud. The performance of the chatbot was evaluated using accuracy, and the results indicate that the combination of visualization and classification techniques significantly improves the chatbot's performance.

Keywords: sentimental analysis, NLP, medical chatbot, decision tree, heatmap, naïve bayes, word cloud

Procedia PDF Downloads 77
6136 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 116
6135 Perceiving Text-Worlds as a Cognitive Mechanism to Understand Surah Al-Kahf

Authors: Awatef Boubakri, Khaled Jebahi

Abstract:

Using Text World Theory (TWT), we attempted to understand how mental representations (text worlds) and perceptions can be construed by readers of Quranic texts. To this end, Surah Al-Kahf was purposefully selected given the fact that while each of its stories is narrated, different levels of discourse intervene, which might result in a confused reader who might find it hard to keep track of which discourse he or she is processing. This surah was studied using specifically-designed text-world diagrams. The findings suggest that TWT can be used to help solve problems of ambiguity at the level of discourse in Quranic texts and to help construct a thinking reader whose cognitive constructs (text worlds / mental representations) are built through reflecting on the various and often changing components of discourse world, text world, and sub-worlds.

Keywords: Al-Kahf, Surah, cognitive, processing, discourse

Procedia PDF Downloads 90
6134 Investigation of Topic Modeling-Based Semi-Supervised Interpretable Document Classifier

Authors: Dasom Kim, William Xiu Shun Wong, Yoonjin Hyun, Donghoon Lee, Minji Paek, Sungho Byun, Namgyu Kim

Abstract:

There have been many researches on document classification for classifying voluminous documents automatically. Through document classification, we can assign a specific category to each unlabeled document on the basis of various machine learning algorithms. However, providing labeled documents manually requires considerable time and effort. To overcome the limitations, the semi-supervised learning which uses unlabeled document as well as labeled documents has been invented. However, traditional document classifiers, regardless of supervised or semi-supervised ones, cannot sufficiently explain the reason or the process of the classification. Thus, in this paper, we proposed a methodology to visualize major topics and class components of each document. We believe that our methodology for visualizing topics and classes of each document can enhance the reliability and explanatory power of document classifiers.

Keywords: data mining, document classifier, text mining, topic modeling

Procedia PDF Downloads 403
6133 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images

Authors: Jameela Ali Alkrimi, Abdul Rahim Ahmad, Azizah Suliman, Loay E. George

Abstract:

Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. The lack of RBCs is a condition characterized by lower than normal hemoglobin level; this condition is referred to as 'anemia'. In this study, a software was developed to isolate RBCs by using a machine learning approach to classify anemic RBCs in microscopic images. Several features of RBCs were extracted using image processing algorithms, including principal component analysis (PCA). With the proposed method, RBCs were isolated in 34 second from an image containing 18 to 27 cells. We also proposed that PCA could be performed to increase the speed and efficiency of classification. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network ANN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained for a short time period with more efficient when PCA was used.

Keywords: red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC

Procedia PDF Downloads 405
6132 Polycode Texts in Communication of Antisocial Groups: Functional and Pragmatic Aspects

Authors: Ivan Potapov

Abstract:

Background: The aim of this paper is to investigate poly code texts in the communication of youth antisocial groups. Nowadays, the notion of a text has numerous interpretations. Besides all the approaches to defining a text, we must take into account semiotic and cultural-semiotic ones. Rapidly developing IT, world globalization, and new ways of coding of information increase the role of the cultural-semiotic approach. However, the development of computer technologies leads also to changes in the text itself. Polycode texts play a more and more important role in the everyday communication of the younger generation. Therefore, the research of functional and pragmatic aspects of both verbal and non-verbal content is actually quite important. Methods and Material: For this survey, we applied the combination of four methods of text investigation: not only intention and content analysis but also semantic and syntactic analysis. Using these methods provided us with information on general text properties, the content of transmitted messages, and each communicants’ intentions. Besides, during our research, we figured out the social background; therefore, we could distinguish intertextual connections between certain types of polycode texts. As the sources of the research material, we used 20 public channels in the popular messenger Telegram and data extracted from smartphones, which belonged to arrested members of antisocial groups. Findings: This investigation let us assert that polycode texts can be characterized as highly intertextual language unit. Moreover, we could outline the classification of these texts based on communicants’ intentions. The most common types of antisocial polycode texts are a call to illegal actions and agitation. What is more, each type has its own semantic core: it depends on the sphere of communication. However, syntactic structure is universal for most of the polycode texts. Conclusion: Polycode texts play important role in online communication. The results of this investigation demonstrate that in some social groups using these texts has a destructive influence on the younger generation and obviously needs further researches.

Keywords: text, polycode text, internet linguistics, text analysis, context, semiotics, sociolinguistics

Procedia PDF Downloads 134
6131 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP

Procedia PDF Downloads 101
6130 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 537
6129 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 141
6128 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 314
6127 Application of Support Vector Machines in Fault Detection and Diagnosis of Power Transmission Lines

Authors: I. A. Farhat, M. Bin Hasan

Abstract:

A developed approach for the protection of power transmission lines using Support Vector Machines (SVM) technique is presented. In this paper, the SVM technique is utilized for the classification and isolation of faults in power transmission lines. Accurate fault classification and location results are obtained for all possible types of short circuit faults. As in distance protection, the approach utilizes the voltage and current post-fault samples as inputs. The main advantage of the method introduced here is that the method could easily be extended to any power transmission line.

Keywords: fault detection, classification, diagnosis, power transmission line protection, support vector machines (SVM)

Procedia PDF Downloads 560
6126 The Untranslatability of the Qur’an

Authors: Mina Elhjouji

Abstract:

The aim of this paper is to raise awareness of the untranslatability of the Qur’an and to suggest some solutions that can help the translator in the process of transferring the meaning from the source text to the target text as much as possible. After the introduction, the miraculous character of the Qur’an shall be illustrated. Then, the difficulty of translating religious texts will be shown in terms of different causes; thematic, cultural, and linguistic. Some examples shall illustrate each type of these difficulties. Finally, some strategies that can help translate the Quran’s meanings will be suggested.

Keywords: translation, religious text, untranslatability, The Qur’an miracle, communicative theory

Procedia PDF Downloads 16
6125 Review on Effective Texture Classification Techniques

Authors: Sujata S. Kulkarni

Abstract:

Effective and efficient texture feature extraction and classification is an important problem in image understanding and recognition. This paper gives a review on effective texture classification method. The objective of the problem of texture representation is to reduce the amount of raw data presented by the image, while preserving the information needed for the task. Texture analysis is important in many applications of computer image analysis for classification include industrial and biomedical surface inspection, for example for defects and disease, ground classification of satellite or aerial imagery and content-based access to image databases.

Keywords: compressed sensing, feature extraction, image classification, texture analysis

Procedia PDF Downloads 437
6124 A Socio-Pragmatic Investigation of Gender Enactment in New Month Text Messages

Authors: Esther Robert, Romanus Aboh

Abstract:

This paper undertakes a socio-pragmatic investigation of gender enactment in new month text messages. This study employs Gumperz’s Interactional Sociolinguistics as its theoretical point of reference to investigate how people create meaning through social interaction. This theory attempts to analyse any social interaction based on contextualization cues and presuppositions. This study explores the appropriateness of language used in texting. The text messages are collected from different mobile phones from different genders, which form the data for this paper. The study observes remarkable differences between genders in the use of informal language. The study reveals that men and women differ remarkably in conversational interaction as well as in writing. While it is observed that women are emotional, orderly, and meticulous, detailed and observed certain grammatical rules, men are casual, brief and appear to show evidence that less attention is paid to grammatical rules. Also, the study shows women as relaxing, showing love, care, concern with their emotive, spirit-raising and touching language, while mean are direct, short, and straight to the point. It is discovered through the study that women behave this way because of their brain-wiring. That is why language and communication matter more to women than to men and this reflects in their new month text messages.

Keywords: difference, emotionalised expressions, gender, texting

Procedia PDF Downloads 256