Search results for: text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3221

Search results for: text classification

2951 The Use of Layered Neural Networks for Classifying Hierarchical Scientific Fields of Study

Authors: Colin Smith, Linsey S Passarella

Abstract:

Due to the proliferation and decentralized nature of academic publication, no widely accepted scheme exists for organizing papers by their scientific field of study (FoS) to the author’s best knowledge. While many academic journals require author provided keywords for papers, these keywords range wildly in scope and are not consistent across papers, journals, or field domains, necessitating alternative approaches to paper classification. Past attempts to perform field-of-study (FoS) classification on scientific texts have largely used a-hierarchical FoS schemas or ignored the schema’s inherently hierarchical structure, e.g. by compressing the structure into a single layer for multi-label classification. In this paper, we introduce an application of a Layered Neural Network (LNN) to the problem of performing supervised hierarchical classification of scientific fields of study (FoS) on research papers. In this approach, paper embeddings from a pretrained language model are fed into a top-down LNN. Beginning with a single neural network (NN) for the highest layer of the class hierarchy, each node uses a separate local NN to classify the subsequent subfield child node(s) for an input embedding of concatenated paper titles and abstracts. We compare our LNN-FOS method to other recent machine learning methods using the Microsoft Academic Graph (MAG) FoS hierarchy and find that the LNN-FOS offers increased classification accuracy at each FoS hierarchical level.

Keywords: hierarchical classification, layer neural network, scientific field of study, scientific taxonomy

Procedia PDF Downloads 92
2950 A Study on Information Structure in the Vajrachedika-Prajna-paramita Sutra and Translation Aspect

Authors: Yoon-Cheol Park

Abstract:

This research focuses on examining the information structures in the old Chinese character-Korean translation of the Vajrachedika-prajna-paramita sutra. The background of this research comes from the fact that there were no previous researches which looked into the information structures in the target text of the Vajrachedika-prajna-paramita sutra by now. The existing researches on the Buddhist scripture translation mainly put weight on message conveyance by literal and semantic translation methods. But the message conveyance from one language to another has a necessity to be delivered with equivalent information structure. Thus, this research is intended to investigate on the flow of old and new information in the target text of Buddhist scripture, compared with source text. The Vajrachedika-prajna-paramita sutra unlike other Buddhist scriptures is composed of conversational structures between Buddha and his disciple, Suboli. This implies that the information flow can be changed by utterance context and some propositions. So, this research tries to analyze the flow of old and new information within the source and target text. As a result of analysis, this research can discover the following facts; firstly, there are the differences of the information flow in the message conveyance between the old Chinese character and Korean by language features. The old Chinese character reveals that old-new information flow is developed, while Korean indicates new-old information flow because of word order. Secondly, the source text of the Vajrachedika-prajna-paramita sutra includes abstruse terminologies, jargon and abstract words. These make influence on the target text and cause the change of the information flow. But the repetitive expressions of these words provide the old information in the target text. Lastly, the Vajrachedika-prajna-paramita sutra offers the expository structure from conversations between Buddha and Suboli. It means that the information flow is developed in the way of explaining specific subjects and of paraphrasing unfamiliar phrases and expressions. From the results of analysis above, this research can verify that the information structures in the target text of the Vajrachedika-prajna-paramita sutra are changed by specific subjects and terminologies, developed with the new-old information flow by repetitive expressions or word order and reveal the information structures familiar to target culture. It also implies that the translation of the Vajrachedika-prajna-paramita sutra as a religious book needs the message conveyance to take into account the information structures of two languages.

Keywords: abstruse terminologies, the information structure, new and old information, old Chinese character-Korean translation

Procedia PDF Downloads 343
2949 Contextual Distribution for Textual Alignment

Authors: Yuri Bizzoni, Marianne Reboul

Abstract:

Our program compares French and Italian translations of Homer’s Odyssey, from the XVIth to the XXth century. We focus on the third point, showing how distributional semantics systems can be used both to improve alignment between different French translations as well as between the Greek text and a French translation. Although we focus on French examples, the techniques we display are completely language independent.

Keywords: classical receptions, computational linguistics, distributional semantics, Homeric poems, machine translation, translation studies, text alignment

Procedia PDF Downloads 401
2948 Translation Directionality: An Eye Tracking Study

Authors: Elahe Kamari

Abstract:

Research on translation process has been conducted for more than 20 years, investigating various issues and using different research methodologies. Most recently, researchers have started to use eye tracking to study translation processes. They believed that the observable, measurable data that can be gained from eye tracking are indicators of unobservable cognitive processes happening in the translators’ mind during translation tasks. The aim of this study was to investigate directionality in translation processes through using eye tracking. The following hypotheses were tested: 1) processing the target text requires more cognitive effort than processing the source text, in both directions of translation; 2) L2 translation tasks on the whole require more cognitive effort than L1 tasks; 3) cognitive resources allocated to the processing of the source text is higher in L1 translation than in L2 translation; 4) cognitive resources allocated to the processing of the target text is higher in L2 translation than in L1 translation; and 5) in both directions non-professional translators invest more cognitive effort in translation tasks than do professional translators. The performance of a group of 30 male professional translators was compared with that of a group of 30 male non-professional translators. All the participants translated two comparable texts one into their L1 (Persian) and the other into their L2 (English). The eye tracker measured gaze time, average fixation duration, total task length and pupil dilation. These variables are assumed to measure the cognitive effort allocated to the translation task. The data derived from eye tracking only confirmed the first hypothesis. This hypothesis was confirmed by all the relevant indicators: gaze time, average fixation duration and pupil dilation. The second hypothesis that L2 translation tasks requires allocation of more cognitive resources than L1 translation tasks has not been confirmed by all four indicators. The third hypothesis that source text processing requires more cognitive resources in L1 translation than in L2 translation and the fourth hypothesis that target text processing requires more cognitive effort in L2 translation than L1 translation were not confirmed. It seems that source text processing in L2 translation can be just as demanding as in L1 translation. The final hypothesis that non-professional translators allocate more cognitive resources for the same translation tasks than do the professionals was partially confirmed. One of the indicators, average fixation duration, indicated higher cognitive effort-related values for professionals.

Keywords: translation processes, eye tracking, cognitive resources, directionality

Procedia PDF Downloads 427
2947 Recontextualisation of Political Discourse: A Case Study of Translation of News Stories

Authors: Hossein Sabouri

Abstract:

News stories as one of the branches of political discourse has always been regarded a sensitive and challenging area. Political translators often encounter some struggles that are vitally important when it comes to deal with the political tension between the source culture and the target one. Translating news stories is of prime importance since it has widespread availability and power of defining or even changing the facts. News translation is usually more than straight transfer of source text. Like original text endeavoring to manipulate the readers’ minds with imposing their ideologies, translated text seeking to change these ideologies influenced by ideological power. In other words, translation product is not considered more than a recontextualisation of the source text. The present study examines possible criteria for occurring changes in the translation process of news stories based on the ideological and political stance of translator using theories of ‘critical discourse analysis’and ‘translation and power. Fairclough investigates the ideological issues in (political) discourse and Tymoczko studies the political and power-related engagement of the translator in the process of translation. Incorporation of Fairclough and Gentzler and Tymoczko’s theories paves the way for the researcher to looks at the ideological power position of the translator. Data collection and analysis have been accomplished using 17 political-text samples taken from online news agencies which are related to the ‘Iran’s Nuclear Program’. Based on the findings, recontextualisation is mainly observed in terms of the strategies of ‘substitution, omissions, and addition’ in the translation process. The results of the study suggest that there is a significant relationship between the translation of political texts and ideologies of target culture.

Keywords: news translation, recontextualisation, ideological power, political discourse

Procedia PDF Downloads 153
2946 A Study on the Performance of 2-PC-D Classification Model

Authors: Nurul Aini Abdul Wahab, Nor Syamim Halidin, Sayidatina Aisah Masnan, Nur Izzati Romli

Abstract:

There are many applications of principle component method for reducing the large set of variables in various fields. Fisher’s Discriminant function is also a popular tool for classification. In this research, the researcher focuses on studying the performance of Principle Component-Fisher’s Discriminant function in helping to classify rice kernels to their defined classes. The data were collected on the smells or odour of the rice kernel using odour-detection sensor, Cyranose. 32 variables were captured by this electronic nose (e-nose). The objective of this research is to measure how well a combination model, between principle component and linear discriminant, to be as a classification model. Principle component method was used to reduce all 32 variables to a smaller and manageable set of components. Then, the reduced components were used to develop the Fisher’s Discriminant function. In this research, there are 4 defined classes of rice kernel which are Aromatic, Brown, Ordinary and Others. Based on the output from principle component method, the 32 variables were reduced to only 2 components. Based on the output of classification table from the discriminant analysis, 40.76% from the total observations were correctly classified into their classes by the PC-Discriminant function. Indirectly, it gives an idea that the classification model developed has committed to more than 50% of misclassifying the observations. As a conclusion, the Fisher’s Discriminant function that was built on a 2-component from PCA (2-PC-D) is not satisfying to classify the rice kernels into its defined classes.

Keywords: classification model, discriminant function, principle component analysis, variable reduction

Procedia PDF Downloads 303
2945 Developing House’s Model to Assess the Translation of Key Cultural Texts

Authors: Raja Al-Ghamdi

Abstract:

This paper aims to systematically assess the translation of key cultural texts. The paper, therefore, proposes a modification of the discourse analysis model for translation quality assessment introduced by the linguist Juliane House (1977, 1997, 2015). The data for analysis has been chosen from a religious text that has never been investigated before. It is an overt translation of the biography of Prophet Mohammad. The book is written originally in Arabic and translated into English. A soft copy of the translation, entitled The Sealed Nectar, is posted on numerous websites including the Internet Archive library which offers a free access to everyone. The text abounds with linguistic and cultural phenomena relevant to Islamic and Arab lingua-cultural context which make its translation a challenge, as well as its assessment. Interesting findings show that (1) culturemes are rich points and both the translator’s subjectivity and intervention are apparent in mediating them, (2) given the nature of historical narration, the source text reflects the author’s positive shading, whereas the target text reflects the translator’s axiological orientation as neutrally shaded, and, (3) linguistic gaps, metaphorical expressions and intertextuality are major stimuli to compensation strategies.

Keywords: Arabic-English discourse analysis, key cultural texts, overt translation, quality assessment

Procedia PDF Downloads 249
2944 A Socio-Pragmatic Investigation of Gender Enactment in New Month Text Messages

Authors: Esther Robert, Romanus Aboh

Abstract:

This paper undertakes a socio-pragmatic investigation of gender enactment in new month text messages. This study employs Gumperz’s Interactional Sociolinguistics as its theoretical point of reference to investigate how people create meaning through social interaction. This theory attempts to analyse any social interaction based on contextualization cues and presuppositions. This study explores the appropriateness of language used in texting. The text messages are collected from different mobile phones from different genders, which form the data for this paper. The study observes remarkable differences between genders in the use of informal language. The study reveals that men and women differ remarkably in conversational interaction as well as in writing. While it is observed that women are emotional, orderly, and meticulous, detailed and observed certain grammatical rules, men are casual, brief and appear to show evidence that less attention is paid to grammatical rules. Also, the study shows women as relaxing, showing love, care, concern with their emotive, spirit-raising and touching language, while mean are direct, short, and straight to the point. It is discovered through the study that women behave this way because of their brain-wiring. That is why language and communication matter more to women than to men and this reflects in their new month text messages.

Keywords: difference, emotionalised expressions, gender, texting

Procedia PDF Downloads 216
2943 The Design of the Multi-Agent Classification System (MACS)

Authors: Mohamed R. Mhereeg

Abstract:

The paper discusses the design of a .NET Windows Service based agent system called MACS (Multi-Agent Classification System). MACS is a system aims to accurately classify spread-sheet developers competency over a network. It is designed to automatically and autonomously monitor spread-sheet users and gather their development activities based on the utilization of the software Multi-Agent Technology (MAS). This is accomplished in such a way that makes management capable to efficiently allow for precise tailor training activities for future spread-sheet development. The monitoring agents of MACS are intended to be distributed over the WWW in order to satisfy the monitoring and classification of the multiple developer aspect. The Prometheus methodology is used for the design of the agents of MACS. Prometheus has been used to undertake this phase of the system design because it is developed specifically for specifying and designing agent-oriented systems. Additionally, Prometheus specifies also the communication needed between the agents in order to coordinate to achieve their delegated tasks.

Keywords: classification, design, MACS, MAS, prometheus

Procedia PDF Downloads 368
2942 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 98
2941 Artificial Intelligence Assisted Sentiment Analysis of Hotel Reviews Using Topic Modeling

Authors: Sushma Ghogale

Abstract:

With a surge in user-generated content or feedback or reviews on the internet, it has become possible and important to know consumers' opinions about products and services. This data is important for both potential customers and businesses providing the services. Data from social media is attracting significant attention and has become the most prominent channel of expressing an unregulated opinion. Prospective customers look for reviews from experienced customers before deciding to buy a product or service. Several websites provide a platform for users to post their feedback for the provider and potential customers. However, the biggest challenge in analyzing such data is in extracting latent features and providing term-level analysis of the data. This paper proposes an approach to use topic modeling to classify the reviews into topics and conduct sentiment analysis to mine the opinions. This approach can analyse and classify latent topics mentioned by reviewers on business sites or review sites, or social media using topic modeling to identify the importance of each topic. It is followed by sentiment analysis to assess the satisfaction level of each topic. This approach provides a classification of hotel reviews using multiple machine learning techniques and comparing different classifiers to mine the opinions of user reviews through sentiment analysis. This experiment concludes that Multinomial Naïve Bayes classifier produces higher accuracy than other classifiers.

Keywords: latent Dirichlet allocation, topic modeling, text classification, sentiment analysis

Procedia PDF Downloads 72
2940 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets - UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: texture classification, texture descriptor, SIFT, SURF, ORB

Procedia PDF Downloads 330
2939 A Hierarchical Method for Multi-Class Probabilistic Classification Vector Machines

Authors: P. Byrnes, F. A. DiazDelaO

Abstract:

The Support Vector Machine (SVM) has become widely recognised as one of the leading algorithms in machine learning for both regression and binary classification. It expresses predictions in terms of a linear combination of kernel functions, referred to as support vectors. Despite its popularity amongst practitioners, SVM has some limitations, with the most significant being the generation of point prediction as opposed to predictive distributions. Stemming from this issue, a probabilistic model namely, Probabilistic Classification Vector Machines (PCVM), has been proposed which respects the original functional form of SVM whilst also providing a predictive distribution. As physical system designs become more complex, an increasing number of classification tasks involving industrial applications consist of more than two classes. Consequently, this research proposes a framework which allows for the extension of PCVM to a multi class setting. Additionally, the original PCVM framework relies on the use of type II maximum likelihood to provide estimates for both the kernel hyperparameters and model evidence. In a high dimensional multi class setting, however, this approach has been shown to be ineffective due to bad scaling as the number of classes increases. Accordingly, we propose the application of Markov Chain Monte Carlo (MCMC) based methods to provide a posterior distribution over both parameters and hyperparameters. The proposed framework will be validated against current multi class classifiers through synthetic and real life implementations.

Keywords: probabilistic classification vector machines, multi class classification, MCMC, support vector machines

Procedia PDF Downloads 200
2938 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding

Authors: Vadivel Ayyasamy

Abstract:

The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.

Keywords: emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation

Procedia PDF Downloads 348
2937 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: classification, fuzzy logic, tolerance relations, rainfall data

Procedia PDF Downloads 276
2936 Efficient Schemes of Classifiers for Remote Sensing Satellite Imageries of Land Use Pattern Classifications

Authors: S. S. Patil, Sachidanand Kini

Abstract:

Classification of land use patterns is compelling in complexity and variability of remote sensing imageries data. An imperative research in remote sensing application exploited to mine some of the significant spatially variable factors as land cover and land use from satellite images for remote arid areas in Karnataka State, India. The diverse classification techniques, unsupervised and supervised consisting of maximum likelihood, Mahalanobis distance, and minimum distance are applied in Bellary District in Karnataka State, India for the classification of the raw satellite images. The accuracy evaluations of results are compared visually with the standard maps with ground-truths. We initiated with the maximum likelihood technique that gave the finest results and both minimum distance and Mahalanobis distance methods over valued agriculture land areas. In meanness of mislaid few irrelevant features due to the low resolution of the satellite images, high-quality accord between parameters extracted automatically from the developed maps and field observations was found.

Keywords: Mahalanobis distance, minimum distance, supervised, unsupervised, user classification accuracy, producer's classification accuracy, maximum likelihood, kappa coefficient

Procedia PDF Downloads 151
2935 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 85
2934 A Text-Oriented Study on Treatises and the End of the Struggles in Silius

Authors: Arianna Sacerdoti

Abstract:

This paper is original and fills, to our best Knowledge, a gap in secondary literature. It analyzes the presence of treatises in Silius Italicus’ Punica and what happens in the plot when a struggle ends. As a result, we will understand if treatises are stipulated or broken, and which narrative devices go with the presence of treatises and the end of the battles. Methodology will be text-oriented, and all the passages will be presented in the Latin language and discussed. In concluding, it is important to understand – in a poem based on war – the role of treatises and the end of battles in Silius Italicus.

Keywords: Flavian Epic, Silius Italicus, Punica, treatises

Procedia PDF Downloads 96
2933 Job Shop Scheduling: Classification, Constraints and Objective Functions

Authors: Majid Abdolrazzagh-Nezhad, Salwani Abdullah

Abstract:

The job-shop scheduling problem (JSSP) is an important decision facing those involved in the fields of industry, economics and management. This problem is a class of combinational optimization problem known as the NP-hard problem. JSSPs deal with a set of machines and a set of jobs with various predetermined routes through the machines, where the objective is to assemble a schedule of jobs that minimizes certain criteria such as makespan, maximum lateness, and total weighted tardiness. Over the past several decades, interest in meta-heuristic approaches to address JSSPs has increased due to the ability of these approaches to generate solutions which are better than those generated from heuristics alone. This article provides the classification, constraints and objective functions imposed on JSSPs that are available in the literature.

Keywords: job-shop scheduling, classification, constraints, objective functions

Procedia PDF Downloads 410
2932 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark

Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos

Abstract:

This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.

Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark

Procedia PDF Downloads 84
2931 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: data mining, digital libraries, digital preservation, file format

Procedia PDF Downloads 465
2930 Math and Religion in Arvo Pärt's Out of the Depths

Authors: Ismael Lins Patriota

Abstract:

Arvo Pärt is an Estonian composer who started his musical career under the influence of twelve-tone music and dodecaphonism. From 1968 to 1976, he isolated himself to search for a new path as a composer. In this period, he converted to Russian orthodoxy and changed his composing to tintinnabuli, a musical technique combining triadic chords with simple melodies. The recent analysis of Pärt’s output demonstrates that mathematics remained an influence after the invention of tintinnabuli. The present discussion deals with the relationship between math and religion in his work Out of the Depths (1980), proposing a musical-text approach and examining the minimum elements of the piece, such as motives and sub-phrases, which is the main focus of this work, considering text patterns and the role of the organ, which also uses the tintinnabuli system. The analysis of these elements demonstrates that Pärt uses math as a formal element, and the composer combines musical parameters to execute a personal and innovative interpretation of the text.

Keywords: Arvo Pärt, Out of the Depths, math, religion, analysis

Procedia PDF Downloads 47
2929 Brain-Computer Interface Based Real-Time Control of Fixed Wing and Multi-Rotor Unmanned Aerial Vehicles

Authors: Ravi Vishwanath, Saumya Kumaar, S. N. Omkar

Abstract:

Brain-computer interfacing (BCI) is a technology that is almost four decades old, and it was developed solely for the purpose of developing and enhancing the impact of neuroprosthetics. However, in the recent times, with the commercialization of non-invasive electroencephalogram (EEG) headsets, the technology has seen a wide variety of applications like home automation, wheelchair control, vehicle steering, etc. One of the latest developed applications is the mind-controlled quadrotor unmanned aerial vehicle. These applications, however, do not require a very high-speed response and give satisfactory results when standard classification methods like Support Vector Machine (SVM) and Multi-Layer Perceptron (MLPC). Issues are faced when there is a requirement for high-speed control in the case of fixed-wing unmanned aerial vehicles where such methods are rendered unreliable due to the low speed of classification. Such an application requires the system to classify data at high speeds in order to retain the controllability of the vehicle. This paper proposes a novel method of classification which uses a combination of Common Spatial Paradigm and Linear Discriminant Analysis that provides an improved classification accuracy in real time. A non-linear SVM based classification technique has also been discussed. Further, this paper discusses the implementation of the proposed method on a fixed-wing and VTOL unmanned aerial vehicles.

Keywords: brain-computer interface, classification, machine learning, unmanned aerial vehicles

Procedia PDF Downloads 251
2928 Composite Approach to Extremism and Terrorism Web Content Classification

Authors: Kolade Olawande Owoeye, George Weir

Abstract:

Terrorism and extremism activities on the internet are becoming the most significant threats to national security because of their potential dangers. In response to this challenge, law enforcement and security authorities are actively implementing comprehensive measures by countering the use of the internet for terrorism. To achieve the measures, there is need for intelligence gathering via the internet. This includes real-time monitoring of potential websites that are used for recruitment and information dissemination among other operations by extremist groups. However, with billions of active webpages, real-time monitoring of all webpages become almost impossible. To narrow down the search domain, there is a need for efficient webpage classification techniques. This research proposed a new approach tagged: SentiPosit-based method. SentiPosit-based method combines features of the Posit-based method and the Sentistrenght-based method for classification of terrorism and extremism webpages. The experiment was carried out on 7500 webpages obtained through TENE-webcrawler by International Cyber Crime Research Centre (ICCRC). The webpages were manually grouped into three classes which include the ‘pro-extremist’, ‘anti-extremist’ and ‘neutral’ with 2500 webpages in each category. A supervised learning algorithm is then applied on the classified dataset in order to build the model. Results obtained was compared with existing classification method using the prediction accuracy and runtime. It was observed that our proposed hybrid approach produced a better classification accuracy compared to existing approaches within a reasonable runtime.

Keywords: sentiposit, classification, extremism, terrorism

Procedia PDF Downloads 249
2927 Classification of Hyperspectral Image Using Mathematical Morphological Operator-Based Distance Metric

Authors: Geetika Barman, B. S. Daya Sagar

Abstract:

In this article, we proposed a pixel-wise classification of hyperspectral images using a mathematical morphology operator-based distance metric called “dilation distance” and “erosion distance”. This method involves measuring the spatial distance between the spectral features of a hyperspectral image across the bands. The key concept of the proposed approach is that the “dilation distance” is the maximum distance a pixel can be moved without changing its classification, whereas the “erosion distance” is the maximum distance that a pixel can be moved before changing its classification. The spectral signature of the hyperspectral image carries unique class information and shape for each class. This article demonstrates how easily the dilation and erosion distance can measure spatial distance compared to other approaches. This property is used to calculate the spatial distance between hyperspectral image feature vectors across the bands. The dissimilarity matrix is then constructed using both measures extracted from the feature spaces. The measured distance metric is used to distinguish between the spectral features of various classes and precisely distinguish between each class. This is illustrated using both toy data and real datasets. Furthermore, we investigated the role of flat vs. non-flat structuring elements in capturing the spatial features of each class in the hyperspectral image. In order to validate, we compared the proposed approach to other existing methods and demonstrated empirically that mathematical operator-based distance metric classification provided competitive results and outperformed some of them.

Keywords: dilation distance, erosion distance, hyperspectral image classification, mathematical morphology

Procedia PDF Downloads 50
2926 A Novel Probabilistic Spatial Locality of Reference Technique for Automatic Cleansing of Digital Maps

Authors: A. Abdullah, S. Abushalmat, A. Bakshwain, A. Basuhail, A. Aslam

Abstract:

GIS (Geographic Information System) applications require geo-referenced data, this data could be available as databases or in the form of digital or hard-copy agro-meteorological maps. These parameter maps are color-coded with different regions corresponding to different parameter values, converting these maps into a database is not very difficult. However, text and different planimetric elements overlaid on these maps makes an accurate image to database conversion a challenging problem. The reason being, it is almost impossible to exactly replace what was underneath the text or icons; thus, pointing to the need for inpainting. In this paper, we propose a probabilistic inpainting approach that uses the probability of spatial locality of colors in the map for replacing overlaid elements with underlying color. We tested the limits of our proposed technique using non-textual simulated data and compared text removing results with a popular image editing tool using public domain data with promising results.

Keywords: noise, image, GIS, digital map, inpainting

Procedia PDF Downloads 318
2925 Valence and Arousal-Based Sentiment Analysis: A Comparative Study

Authors: Usama Shahid, Muhammad Zunnurain Hussain

Abstract:

This research paper presents a comprehensive analysis of a sentiment analysis approach that employs valence and arousal as its foundational pillars, in comparison to traditional techniques. Sentiment analysis is an indispensable task in natural language processing that involves the extraction of opinions and emotions from textual data. The valence and arousal dimensions, representing the intensity and positivity/negativity of emotions, respectively, enable the creation of four quadrants, each representing a specific emotional state. The study seeks to determine the impact of utilizing these quadrants to identify distinct emotional states on the accuracy and efficiency of sentiment analysis, in comparison to traditional techniques. The results reveal that the valence and arousal-based approach outperforms other approaches, particularly in identifying nuanced emotions that may be missed by conventional methods. The study's findings are crucial for applications such as social media monitoring and market research, where the accurate classification of emotions and opinions is paramount. Overall, this research highlights the potential of using valence and arousal as a framework for sentiment analysis and offers invaluable insights into the benefits of incorporating specific types of emotions into the analysis. These findings have significant implications for researchers and practitioners in the field of natural language processing, as they provide a basis for the development of more accurate and effective sentiment analysis tools.

Keywords: sentiment analysis, valence and arousal, emotional states, natural language processing, machine learning, text analysis, sentiment classification, opinion mining

Procedia PDF Downloads 52
2924 Classification of Red, Green and Blue Values from Face Images Using k-NN Classifier to Predict the Skin or Non-Skin

Authors: Kemal Polat

Abstract:

In this study, it has been estimated whether there is skin by using RBG values obtained from the camera and k-nearest neighbor (k-NN) classifier. The dataset used in this study has an unbalanced distribution and a linearly non-separable structure. This problem can also be called a big data problem. The Skin dataset was taken from UCI machine learning repository. As the classifier, we have used the k-NN method to handle this big data problem. For k value of k-NN classifier, we have used as 1. To train and test the k-NN classifier, 50-50% training-testing partition has been used. As the performance metrics, TP rate, FP Rate, Precision, recall, f-measure and AUC values have been used to evaluate the performance of k-NN classifier. These obtained results are as follows: 0.999, 0.001, 0.999, 0.999, 0.999, and 1,00. As can be seen from the obtained results, this proposed method could be used to predict whether the image is skin or not.

Keywords: k-NN classifier, skin or non-skin classification, RGB values, classification

Procedia PDF Downloads 215
2923 Comparison of Linear Discriminant Analysis and Support Vector Machine Classifications for Electromyography Signals Acquired at Five Positions of Elbow Joint

Authors: Amna Khan, Zareena Kausar, Saad Malik

Abstract:

Bio Mechatronics has extended applications in the field of rehabilitation. It has been contributing since World War II in improving the applicability of prosthesis and assistive devices in real life scenarios. In this paper, classification accuracies have been compared for two classifiers against five positions of elbow. Electromyography (EMG) signals analysis have been acquired directly from skeletal muscles of human forearm for each of the three defined positions and at modified extreme positions of elbow flexion and extension using 8 electrode Myo armband sensor. Features were extracted from filtered EMG signals for each position. Performance of two classifiers, support vector machine (SVM) and linear discriminant analysis (LDA) has been compared by analyzing the classification accuracies. SVM illustrated classification accuracies between 90-96%, in contrast to 84-87% depicted by LDA for five defined positions of elbow keeping the number of samples and selected feature the same for both SVM and LDA.

Keywords: classification accuracies, electromyography, linear discriminant analysis (LDA), Myo armband sensor, support vector machine (SVM)

Procedia PDF Downloads 320
2922 Neural Network Based Decision Trees Using Machine Learning for Alzheimer's Diagnosis

Authors: P. S. Jagadeesh Kumar, Tracy Lin Huan, S. Meenakshi Sundaram

Abstract:

Alzheimer’s disease is one of the prevalent kind of ailment, expected for impudent reconciliation or an effectual therapy is to be accredited hitherto. Probable detonation of patients in the upcoming years, and consequently an enormous deal of apprehension in early discovery of the disorder, this will conceivably chaperon to enhanced healing outcomes. Complex impetuosity of the brain is an observant symbolic of the disease and a unique recognition of genetic sign of the disease. Machine learning alongside deep learning and decision tree reinforces the aptitude to absorb characteristics from multi-dimensional data’s and thus simplifies automatic classification of Alzheimer’s disease. Susceptible testing was prophesied and realized in training the prospect of Alzheimer’s disease classification built on machine learning advances. It was shrewd that the decision trees trained with deep neural network fashioned the excellent results parallel to related pattern classification.

Keywords: Alzheimer's diagnosis, decision trees, deep neural network, machine learning, pattern classification

Procedia PDF Downloads 266