Search results for: text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3228

Search results for: text classification

2808 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm

Authors: Kamel Belammi, Houria Fatrim

Abstract:

imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.

Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes

Procedia PDF Downloads 501
2807 Developing a Model of Teaching Writing Based On Reading Approach through Reflection Strategy for EFL Students of STKIP YPUP

Authors: Eny Syatriana, Ardiansyah

Abstract:

The purpose of recent study was to develop a learning model on writing, based on the reading texts which will be read by the students using reflection strategy. The strategy would allow the students to read the text and then they would write back the main idea and to develop the text by using their own sentences. So, the writing practice was begun by reading an interesting text, then the students would develop the text which has been read into their writing. The problem questions are (1) what kind of learning model that can develop the students writing ability? (2) what is the achievement of the students of STKIP YPUP through reflection strategy? (3) is the using of the strategy effective to develop students competence In writing? (4) in what level are the students interest toward the using of a strategy In writing subject? This development research consisted of some steps, they are (1) need analysis (2) model design (3) implementation (4) model evaluation. The need analysis was applied through discussion among the writing lecturers to create a learning model for writing subject. To see the effectiveness of the model, an experiment would be delivered for one class. The instrument and learning material would be validated by the experts. In every steps of material development, there was a learning process, where would be validated by an expert. The research used development design. These Principles and procedures or research design and development .This study, researcher would do need analysis, creating prototype, content validation, and limited empiric experiment to the sample. In each steps, there should be an assessment and revision to the drafts before continue to the next steps. The second year, the prototype would be tested empirically to four classes in STKIP YPUP for English department. Implementing the test greatly was done through the action research and followed by evaluation and validation from the experts.

Keywords: learning model, reflection, strategy, reading, writing, development

Procedia PDF Downloads 343
2806 Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks

Authors: Abdesselem Dakhli, Wajdi Bellil, Chokri Ben Amar

Abstract:

DNA Barcode, a short mitochondrial DNA fragment, made up of three subunits; a phosphate group, sugar and nucleic bases (A, T, C, and G). They provide good sources of information needed to classify living species. Such intuition has been confirmed by many experimental results. Species classification with DNA Barcode sequences has been studied by several researchers. The classification problem assigns unknown species to known ones by analyzing their Barcode. This task has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. To make this type of analysis feasible, heuristics, like progressive alignment, have been developed. Another tool for similarity search against a database of sequences is BLAST, which outputs shorter regions of high similarity between a query sequence and matched sequences in the database. However, all these methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. This method permits to avoid the complex problem of form and structure in different classes of organisms. On empirical data and their classification performances are compared with other methods. Our system consists of three phases. The first is called transformation, which is composed of three steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. The second is called approximation, which is empowered by the use of Multi Llibrary Wavelet Neural Networks (MLWNN).The third is called the classification of DNA Barcodes, which is realized by applying the algorithm of hierarchical classification.

Keywords: DNA barcode, electron-ion interaction pseudopotential, Multi Library Wavelet Neural Networks (MLWNN)

Procedia PDF Downloads 291
2805 Translation of Culture-Specific References in the Turkish Translation of Shakespeare's Macbeth

Authors: Feride Sumbul

Abstract:

Drama is a literary genre that mirrors the people and society and transfers the human nature and life to the reader or the audience within its own social-cultural structure. Each play takes on a new reality in the time and culture of the staging, and each performance actually brings a new interpretation to the play. Similarly, each translation adds a new meaning to the source text. In other words, the translated theatrical text transcends the boundaries of its language and culture and finds a new interpretation. Thus the translation of drama takes place as a transfer from one culture to another as a cross cultural communication. In this context, translating culture specific references play a key role in terms of reflecting cultural aspects of a target society. This study aims to explore the use of Venuti's translation principles of domestication and foreignization in the transfer of culture specific references in the Turkish translation of Shakespeare's Macbeth. Macbeth is to be compared with its Turkish version in terms of the transference of culture specific references such as religious, witchcraft, and mythological, which have no equivalent in the target language and culture. To evaluate these principles of Venuti, Davies’s translation strategies are also conducted. As a method, for the most part, he predominantly uses Davies’ method of ‘addition’ through adding extra information in the notes. For instance, rather than finding the Turkish renderings of them, the translator mostly chooses to transfer witchcraft references through retaining them in the target text, but he mainly adds extra information about the references in the notes. Therefore, the translator Nutku mostly uses Venuti’s translation principle of foreignization so that he preserves the foreignness of the theatrical text.

Keywords: drama translation, theatrical texts, culture specific references, Macbeth

Procedia PDF Downloads 133
2804 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 68
2803 Estimating Tree Height and Forest Classification from Multi Temporal Risat-1 HH and HV Polarized Satellite Aperture Radar Interferometric Phase Data

Authors: Saurav Kumar Suman, P. Karthigayani

Abstract:

In this paper the height of the tree is estimated and forest types is classified from the multi temporal RISAT-1 Horizontal-Horizontal (HH) and Horizontal-Vertical (HV) Polarised Satellite Aperture Radar (SAR) data. The novelty of the proposed project is combined use of the Back-scattering Coefficients (Sigma Naught) and the Coherence. It uses Water Cloud Model (WCM). The approaches use two main steps. (a) Extraction of the different forest parameter data from the Product.xml, BAND-META file and from Grid-xxx.txt file come with the HH & HV polarized data from the ISRO (Indian Space Research Centre). These file contains the required parameter during height estimation. (b) Calculation of the Vegetation and Ground Backscattering, Coherence and other Forest Parameters. (c) Classification of Forest Types using the ENVI 5.0 Tool and ROI (Region of Interest) calculation.

Keywords: RISAT-1, classification, forest, SAR data

Procedia PDF Downloads 376
2802 An Experience of Translating an Excerpt from Sophie Adonon’s Echos de Femmes from French to English, Using Reverso.

Authors: Michael Ngongeh Mombe

Abstract:

This Paper seeks to investigate an assertion made by some colleagues that there is no need paying a human translator to translate their literary texts, that there are softwares such as Reverso that can be used to do the translation. The main objective of this study is to examine the veracity of this assertion using Reverso to translate a literary text without any post-editing by a human translator. The work is based on two theories: Skopos and Communicative theories of translation. The work is a documentary research where data were collected from published documents in libraries, on the internet and from the translation produced by Reverso. We made a comparative text analyses of both source and target texts in a bid to highlight the weaknesses and strengths of the software. Findings of this work revealed that those who advocate the use of only Machine translation do so in ignorance of the translation mistakes usually made by the software. From the review of all the 268 segments of translation, we found out that the translation produced by Reverso is fraught with errors. We therefore recommend the use of human translators to either do the translation of their literary texts or revise the translation produced by machine to conform to the skopos of the work. This paper is based on Reverso translation. Similar works in the near future will be based on the other translation softwares to determine their weaknesses and strengths.

Keywords: machine translation, human translator, Reverso, literary text

Procedia PDF Downloads 67
2801 The Influence of Japanese Poetry in Spanish Piano Music: Benet Casablancas and Mercedes Zavala’s Haikus

Authors: Isabel Pérez Dobarro

Abstract:

In the mid-twentieth century, Spanish composers started looking beyond the national folkloric tradition (adopted by Albéniz, Granados, and Falla) and Rodrigo’s neoclassicism, and searched for other sources of inspiration. Japanese Haikus fascinated Spanish musicians, who found in their brevity and imagination a new avenue to develop their creativity. The goal of this research is to study how two renowned Spanish authors, Benet Casablancas and Mercedes Zavala, incorporated Haikus into their piano works. Based on Bruhn’s methodology on text and instrumental music relations, and developing a score and text analysis complemented by interviews with both composers, this study has revealed three possible interactions between the Haikus and these composers’ piano writing: inspiration, transmedialization, and mimesis. Findings also include specific technical gestures to support each of these approaches. Commonalities between their pieces and those by other non-Spanish composers such as Jonathan Harvey, John Cage, and Michael Berkeley have also been explored. According to the author's knowledge, this is the first study on the Japanese influence in Spanish piano music. Thus, it opens a new path for understanding musical exchanges between both countries as well as contemporary piano tools that support the interaction between text and music.

Keywords: Haiku, Spanish piano music, Benet Casablancas, Mercedes Zavala

Procedia PDF Downloads 122
2800 Animated Poetry-Film: Poetry in Action

Authors: Linette van der Merwe

Abstract:

It is known that visual artists, performing artists, and literary artists have inspired each other since time immemorial. The enduring, symbiotic relationship between the various art genres is evident where words, colours, lines, and sounds act as metaphors, a physical separation of the transcendental reality of art. Simonides of Keos (c. 556-468 BC) confirmed this, stating that a poem is a talking picture, or, in a more modern expression, a picture is worth a thousand words. It can be seen as an ancient relationship, originating from the epigram (tombstone or artefact inscriptions), the carmen figuratum (figure poem), and the ekphrasis (a description in the form of a poem of a work of art). Visual artists, including Michelangelo, Leonardo da Vinci, and Goethe, wrote poems and songs. Goya, Degas, and Picasso are famous for their works of art and for trying their hands at poetry. Afrikaans writers whose fine art is often published together with their writing, as in the case of Andries Bezuidenhout, Breyten Breytenbach, Sheila Cussons, Hennie Meyer, Carina Stander, and Johan van Wyk, among others, are not a strange phenomenon either. Imitating one art form into another art form is a form of translation, transposition, contemplation, and discovery of artistic impressions, showing parallel interpretations rather than physical comparison. It is especially about the harmony that exists between the different art genres, i.e., a poem that describes a painting or a visual text that portrays a poem that becomes a translation, interpretation, and rediscovery of the verbal text, or rather, from the word text to the image text. Poetry-film, as a form of such a translation of the word text into an image text, can be considered a hybrid, transdisciplinary art form that connects poetry and film. Poetry-film is regarded as an intertwined entity of word, sound, and visual image. It is an attempt to transpose and transform a poem into a new artwork that makes the poem more accessible to people who are not necessarily open to the written word and will, in effect, attract a larger audience to a genre that usually has a limited market. Poetry-film is considered a creative expression of an inverted ekphrastic inspiration, a visual description, interpretation, and expression of a poem. Research also emphasises that animated poetry-film is not widely regarded as a genre of anything and is thus severely under-theorized. This paper will focus on Afrikaans animated poetry-films as a multimodal transposition of a poem text to an animated poetry film, with specific reference to animated poetry-films in Filmverse I (2014) and Filmverse II (2016).

Keywords: poetry film, animated poetry film, poetic metaphor, conceptual metaphor, monomodal metaphor, multimodal metaphor, semiotic metaphor, multimodality, metaphor analysis, target domain, source domain

Procedia PDF Downloads 38
2799 Recognition of Spelling Problems during the Text in Progress: A Case Study on the Comments Made by Portuguese Students Newly Literate

Authors: E. Calil, L. A. Pereira

Abstract:

The acquisition of orthography is a complex process, involving both lexical and grammatical questions. This learning occurs simultaneously with the domain of multiple textual aspects (e.g.: graphs, punctuation, etc.). However, most of the research on orthographic acquisition focus on this acquisition from an autonomous point of view, separated from the process of textual production. This means that their object of analysis is the production of words selected by the researcher or the requested sentences in an experimental and controlled setting. In addition, the analysis of the Spelling Problems (SP) are identified by the researcher on the sheet of paper. Considering the perspective of Textual Genetics, from an enunciative approach, this study will discuss the SPs recognized by dyads of newly literate students, while they are writing a text collaboratively. Six proposals of textual production were registered, requested by a 2nd year teacher of a Portuguese Primary School between January and March 2015. In our case study we discuss the SPs recognized by the dyad B and L (7 years old). We adopted as a methodological tool the Ramos System audiovisual record. This system allows real-time capture of the text in process and of the face-to-face dialogue between both students and their teacher, and also captures the body movements and facial expressions of the participants during textual production proposals in the classroom. In these ecological conditions of multimodal registration of collaborative writing, we could identify the emergence of SP in two dimensions: i. In the product (finished text): SP identification without recursive graphic marks (without erasures) and the identification of SPs with erasures, indicating the recognition of SP by the student; ii. In the process (text in progress): identification of comments made by students about recognized SPs. Given this, we’ve analyzed the comments on identified SPs during the text in progress. These comments characterize a type of reformulation referred to as Commented Oral Erasure (COE). The COE has two enunciative forms: Simple Comment (SC) such as ' 'X' is written with 'Y' '; or Unfolded Comment (UC), such as ' 'X' is written with 'Y' because...'. The spelling COE may also occur before or during the SP (Early Spelling Recognition - ESR) or after the SP has been entered (Later Spelling Recognition - LSR). There were 631 words entered in the 6 stories written by the B-L dyad, 145 of them containing some type of SP. During the text in progress, the students recognized orally 174 SP, 46 of which were identified in advance (ESRs) and 128 were identified later (LSPs). If we consider that the 88 erasure SPs in the product indicate some form of SP recognition, we can observe that there were twice as many SPs recognized orally. The ESR was characterized by SC when students asked their colleague or teacher how to spell a given word. The LSR presented predominantly UC, verbalizing meta-orthographic arguments, mostly made by L. These results indicate that writing in dyad is an important didactic strategy for the promotion of metalinguistic reflection, favoring the learning of spelling.

Keywords: collaborative writing, erasure, learning, metalinguistic awareness, spelling, text production

Procedia PDF Downloads 141
2798 Syndromic Surveillance Framework Using Tweets Data Analytics

Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden

Abstract:

Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.

Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza

Procedia PDF Downloads 89
2797 Ensemble-Based SVM Classification Approach for miRNA Prediction

Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam

Abstract:

In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.

Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data

Procedia PDF Downloads 312
2796 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 103
2795 Applying Different Stenography Techniques in Cloud Computing Technology to Improve Cloud Data Privacy and Security Issues

Authors: Muhammad Muhammad Suleiman

Abstract:

Cloud Computing is a versatile concept that refers to a service that allows users to outsource their data without having to worry about local storage issues. However, the most pressing issues to be addressed are maintaining a secure and reliable data repository rather than relying on untrustworthy service providers. In this study, we look at how stenography approaches and collaboration with Digital Watermarking can greatly improve the system's effectiveness and data security when used for Cloud Computing. The main requirement of such frameworks, where data is transferred or exchanged between servers and users, is safe data management in cloud environments. Steganography is the cloud is among the most effective methods for safe communication. Steganography is a method of writing coded messages in such a way that only the sender and recipient can safely interpret and display the information hidden in the communication channel. This study presents a new text steganography method for hiding a loaded hidden English text file in a cover English text file to ensure data protection in cloud computing. Data protection, data hiding capability, and time were all improved using the proposed technique.

Keywords: cloud computing, steganography, information hiding, cloud storage, security

Procedia PDF Downloads 165
2794 Prosody of Text Communication: Inducing Synchronization and Coherence in Chat Conversations

Authors: Karolina Ziembowicz, Andrzej Nowak

Abstract:

In the current study, we examined the consequences of adding prosodic cues to text communication by allowing users to observe the process of message creation while engaged in dyadic conversations. In the first condition, users interacted through a traditional chat that requires pressing ‘enter’ to make a message visible to an interlocutor. In another, text appeared on the screen simultaneously as the sender was writing it, letter after letter (Synchat condition), so that users could observe the varying rhythm of message production, precise timing of message appearance, typos and their corrections. The results show that the ability to observe the dynamics of message production had a twofold effect on the social interaction process. First, it enhanced the relational aspect of communication – interlocutors synchronized their emotional states during the interaction, their communication included more statements on relationship building, and they evaluated the Synchat medium as more personal and emotionally engaging. Second, it increased the coherence of communication, reflected in greater continuity of the topics raised in Synchat conversations. The results are discussed from the interaction design (IxD) perspective.

Keywords: chat communication, online conversation, prosody, social synchronization, interaction incoherence, relationship building

Procedia PDF Downloads 120
2793 Optimizing the Readability of Orthopaedic Trauma Patient Education Materials Using ChatGPT-4

Authors: Oscar Covarrubias, Diane Ghanem, Christopher Murdock, Babar Shafiq

Abstract:

Introduction: ChatGPT is an advanced language AI tool designed to understand and generate human-like text. The aim of this study is to assess the ability of ChatGPT-4 to re-write orthopaedic trauma patient education materials at the recommended 6th-grade level. Methods: Two independent reviewers accessed ChatGPT-4 (chat.openai.com) and gave identical instructions to simplify the readability of provided text to a 6th-grade level. All trauma-related articles by the Orthopaedic Trauma Association (OTA) and American Academy of Orthopaedic Surgeons (AAOS) were sequentially provided. The academic grade level was determined using the Flesh-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE). Paired t-tests and Wilcox-rank sum tests were used to compare the FKGL and FRE between the ChatGPT-4 revised and original text. Inter-rater correlation coefficient (ICC) was used to assess variability in ChatGPT-4 generated text between the two reviewers. Results: ChatGPT-4 significantly reduced FKGL and increased FRE scores in the OTA (FKGL: 5.7±0.5 compared to the original 8.2±1.1, FRE: 76.4±5.7 compared to the original 65.5±6.6, p < 0.001) and AAOS articles (FKGL: 5.8±0.8 compared to the original 8.9±0.8, FRE: 76±5.5 compared to the original 56.7±5.9, p < 0.001). On average, 14.6% of OTA and 28.6% of AAOS articles required at least two revisions by ChatGPT-4 to achieve a 6th-grade reading level. ICC demonstrated poor reliability for FKGL (OTA 0.24, AAOS 0.45) and moderate reliability for FRE (OTA 0.61, AAOS 0.73). Conclusion: This study provides a novel, simple and efficient method using language AI to optimize the readability of patient education content which may only require the surgeon’s final proofreading. This method would likely be as effective for other medical specialties.

Keywords: artificial intelligence, AI, chatGPT, patient education, readability, trauma education

Procedia PDF Downloads 48
2792 Architectural Experience of the Everyday in Phuket Old Town

Authors: Thirayu Jumsai na Ayudhya

Abstract:

Initial attempts to understand about what architecture means to people as they go about their everyday life through my previous research revealed that knowledge such as environmental psychology, environmental perception, environmental aesthetics, did not adequately address a perceived need for the contextualized and holistic theoretical framework. In my previous research, it is found that people’s making senses of their everyday architecture can be described in terms of four super‐ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). For more comprehensively understanding of how people make sense of their everyday architectural experience, in this ongoing research Phuket Old town was selected as the focal urban context where the distinguish character of Chino-Portuguese is remarkable. It is expected that in a unique urban context like Phuket old town unprecedented super-ordinate themes will be unveiled through the reflection of people’s everyday experiences. The ongoing research of people’s architectural experience conducted in Phuket Island, Thailand, will be presented succinctly. The research will address the question of how do people make sense of their everyday architecture/buildings especially in a unique urban context, Phuket Old town, and identify ways in which people make sense of their everyday architecture. Participant-Produced-Photograph (PPP) and Interpretative Phenomenological Analysis (IPA) are adopted as main methodologies. PPP allows people to express experiences of their everyday urban context freely without any interference or forced-data generating by researchers. With IPA methodology a small pool of participants is considered desirable given the detailed level of analysis required and its potential to produce a meaningful outcome.

Keywords: architectural experience, the everyday architecture, Phuket, Thailand

Procedia PDF Downloads 273
2791 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: artificial immune system, breast cancer diagnosis, Euclidean function, Gaussian function

Procedia PDF Downloads 414
2790 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 562
2789 National Image in the Age of Mass Self-Communication: An Analysis of Internet Users' Perception of Portugal

Authors: L. Godinho, N. Teixeira

Abstract:

Nowadays, massification of Internet access represents one of the major challenges to the traditional powers of the State, among which the power to control its external image. The virtual world has also sparked the interest of social sciences which consider it a new field of study, an immense open text where sense is expressed. In this paper, that immense text has been accessed to so as to understand the perception Internet users from all over the world have of Portugal. Ours is a quantitative and qualitative approach, as we have resorted to buzz, thematic and category analysis. The results confirm the predominance of sea stereotype in others' vision of the Portuguese people, and evidence that national image has adapted to network communication through processes of individuation and paganization.

Keywords: national image, internet, self-communication, perception

Procedia PDF Downloads 237
2788 From Text to Data: Sentiment Analysis of Presidential Election Political Forums

Authors: Sergio V Davalos, Alison L. Watkins

Abstract:

User generated content (UGC) such as website post has data associated with it: time of the post, gender, location, type of device, and number of words. The text entered in user generated content (UGC) can provide a valuable dimension for analysis. In this research, each user post is treated as a collection of terms (words). In addition to the number of words per post, the frequency of each term is determined by post and by the sum of occurrences in all posts. This research focuses on one specific aspect of UGC: sentiment. Sentiment analysis (SA) was applied to the content (user posts) of two sets of political forums related to the US presidential elections for 2012 and 2016. Sentiment analysis results in deriving data from the text. This enables the subsequent application of data analytic methods. The SASA (SAIL/SAI Sentiment Analyzer) model was used for sentiment analysis. The application of SASA resulted with a sentiment score for each post. Based on the sentiment scores for the posts there are significant differences between the content and sentiment of the two sets for the 2012 and 2016 presidential election forums. In the 2012 forums, 38% of the forums started with positive sentiment and 16% with negative sentiment. In the 2016 forums, 29% started with positive sentiment and 15% with negative sentiment. There also were changes in sentiment over time. For both elections as the election got closer, the cumulative sentiment score became negative. The candidate who won each election was in the more posts than the losing candidates. In the case of Trump, there were more negative posts than Clinton’s highest number of posts which were positive. KNIME topic modeling was used to derive topics from the posts. There were also changes in topics and keyword emphasis over time. Initially, the political parties were the most referenced and as the election got closer the emphasis changed to the candidates. The performance of the SASA method proved to predict sentiment better than four other methods in Sentibench. The research resulted in deriving sentiment data from text. In combination with other data, the sentiment data provided insight and discovery about user sentiment in the US presidential elections for 2012 and 2016.

Keywords: sentiment analysis, text mining, user generated content, US presidential elections

Procedia PDF Downloads 161
2787 Empowering a New Frontier in Heart Disease Detection: Unleashing Quantum Machine Learning

Authors: Sadia Nasrin Tisha, Mushfika Sharmin Rahman, Javier Orduz

Abstract:

Machine learning is applied in a variety of fields throughout the world. The healthcare sector has benefited enormously from it. One of the most effective approaches for predicting human heart diseases is to use machine learning applications to classify data and predict the outcome as a classification. However, with the rapid advancement of quantum technology, quantum computing has emerged as a potential game-changer for many applications. Quantum algorithms have the potential to execute substantially faster than their classical equivalents, which can lead to significant improvements in computational performance and efficiency. In this study, we applied quantum machine learning concepts to predict coronary heart diseases from text data. We experimented thrice with three different features; and three feature sets. The data set consisted of 100 data points. We pursue to do a comparative analysis of the two approaches, highlighting the potential benefits of quantum machine learning for predicting heart diseases.

Keywords: quantum machine learning, SVM, QSVM, matrix product state

Procedia PDF Downloads 64
2786 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 171
2785 Online Handwritten Character Recognition for South Indian Scripts Using Support Vector Machines

Authors: Steffy Maria Joseph, Abdu Rahiman V, Abdul Hameed K. M.

Abstract:

Online handwritten character recognition is a challenging field in Artificial Intelligence. The classification success rate of current techniques decreases when the dataset involves similarity and complexity in stroke styles, number of strokes and stroke characteristics variations. Malayalam is a complex south indian language spoken by about 35 million people especially in Kerala and Lakshadweep islands. In this paper, we consider the significant feature extraction for the similar stroke styles of Malayalam. This extracted feature set are suitable for the recognition of other handwritten south indian languages like Tamil, Telugu and Kannada. A classification scheme based on support vector machines (SVM) is proposed to improve the accuracy in classification and recognition of online malayalam handwritten characters. SVM Classifiers are the best for real world applications. The contribution of various features towards the accuracy in recognition is analysed. Performance for different kernels of SVM are also studied. A graphical user interface has developed for reading and displaying the character. Different writing styles are taken for each of the 44 alphabets. Various features are extracted and used for classification after the preprocessing of input data samples. Highest recognition accuracy of 97% is obtained experimentally at the best feature combination with polynomial kernel in SVM.

Keywords: SVM, matlab, malayalam, South Indian scripts, onlinehandwritten character recognition

Procedia PDF Downloads 549
2784 A t-SNE and UMAP Based Neural Network Image Classification Algorithm

Authors: Shelby Simpson, William Stanley, Namir Naba, Xiaodi Wang

Abstract:

Both t-SNE and UMAP are brand new state of art tools to predominantly preserve the local structure that is to group neighboring data points together, which indeed provides a very informative visualization of heterogeneity in our data. In this research, we develop a t-SNE and UMAP base neural network image classification algorithm to embed the original dataset to a corresponding low dimensional dataset as a preprocessing step, then use this embedded database as input to our specially designed neural network classifier for image classification. We use the fashion MNIST data set, which is a labeled data set of images of clothing objects in our experiments. t-SNE and UMAP are used for dimensionality reduction of the data set and thus produce low dimensional embeddings. Furthermore, we use the embeddings from t-SNE and UMAP to feed into two neural networks. The accuracy of the models from the two neural networks is then compared to a dense neural network that does not use embedding as an input to show which model can classify the images of clothing objects more accurately.

Keywords: t-SNE, UMAP, fashion MNIST, neural networks

Procedia PDF Downloads 167
2783 Against Language Disorder: A Way of Reading Dialects in Yan Lianke’s Novels

Authors: Thuy Hanh Nguyen Thi

Abstract:

By the method of deep reading and text analysis, this article will analyze the use and creation of dialects as a way of demonstrating Yan Lianke's creative stance. This article indicates that this is the writer’s narrative strategy in a fight against aphasia, a language disorder of Chinese people and culture, demonstrating a sense of return to folklore and marks his own linguistic style. In terms of verbal text, the dialect in the Yan Lianke’s novels manifested through the use of words, sentences and dialects. There are two types of dialects that exist in Yan Lianke’s novels: the current dialect system and the particular dialect system of Pa Lau world created by the writer himself in order to enrich the vocabulary of Han Chinese.

Keywords: Yan Lianke , aphasia, dialect, Pa Lou world

Procedia PDF Downloads 101
2782 Autonomous Vehicle Detection and Classification in High Resolution Satellite Imagery

Authors: Ali J. Ghandour, Houssam A. Krayem, Abedelkarim A. Jezzini

Abstract:

High-resolution satellite images and remote sensing can provide global information in a fast way compared to traditional methods of data collection. Under such high resolution, a road is not a thin line anymore. Objects such as cars and trees are easily identifiable. Automatic vehicles enumeration can be considered one of the most important applications in traffic management. In this paper, autonomous vehicle detection and classification approach in highway environment is proposed. This approach consists mainly of three stages: (i) first, a set of preprocessing operations are applied including soil, vegetation, water suppression. (ii) Then, road networks detection and delineation is implemented using built-up area index, followed by several morphological operations. This step plays an important role in increasing the overall detection accuracy since vehicles candidates are objects contained within the road networks only. (iii) Multi-level Otsu segmentation is implemented in the last stage, resulting in vehicle detection and classification, where detected vehicles are classified into cars and trucks. Accuracy assessment analysis is conducted over different study areas to show the great efficiency of the proposed method, especially in highway environment.

Keywords: remote sensing, object identification, vehicle and road extraction, vehicle and road features-based classification

Procedia PDF Downloads 206
2781 Dynamic Distribution Calibration for Improved Few-Shot Image Classification

Authors: Majid Habib Khan, Jinwei Zhao, Xinhong Hei, Liu Jiedong, Rana Shahzad Noor, Muhammad Imran

Abstract:

Deep learning is increasingly employed in image classification, yet the scarcity and high cost of labeled data for training remain a challenge. Limited samples often lead to overfitting due to biased sample distribution. This paper introduces a dynamic distribution calibration method for few-shot learning. Initially, base and new class samples undergo normalization to mitigate disparate feature magnitudes. A pre-trained model then extracts feature vectors from both classes. The method dynamically selects distribution characteristics from base classes (both adjacent and remote) in the embedding space, using a threshold value approach for new class samples. Given the propensity of similar classes to share feature distributions like mean and variance, this research assumes a Gaussian distribution for feature vectors. Subsequently, distributional features of new class samples are calibrated using a corrected hyperparameter, derived from the distribution features of both adjacent and distant base classes. This calibration augments the new class sample set. The technique demonstrates significant improvements, with up to 4% accuracy gains in few-shot classification challenges, as evidenced by tests on miniImagenet and CUB datasets.

Keywords: deep learning, computer vision, image classification, few-shot learning, threshold

Procedia PDF Downloads 35
2780 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 324
2779 COVID-19 Detection from Computed Tomography Images Using UNet Segmentation, Region Extraction, and Classification Pipeline

Authors: Kenan Morani, Esra Kaya Ayana

Abstract:

This study aimed to develop a novel pipeline for COVID-19 detection using a large and rigorously annotated database of computed tomography (CT) images. The pipeline consists of UNet-based segmentation, lung extraction, and a classification part, with the addition of optional slice removal techniques following the segmentation part. In this work, a batch normalization was added to the original UNet model to produce lighter and better localization, which is then utilized to build a full pipeline for COVID-19 diagnosis. To evaluate the effectiveness of the proposed pipeline, various segmentation methods were compared in terms of their performance and complexity. The proposed segmentation method with batch normalization outperformed traditional methods and other alternatives, resulting in a higher dice score on a publicly available dataset. Moreover, at the slice level, the proposed pipeline demonstrated high validation accuracy, indicating the efficiency of predicting 2D slices. At the patient level, the full approach exhibited higher validation accuracy and macro F1 score compared to other alternatives, surpassing the baseline. The classification component of the proposed pipeline utilizes a convolutional neural network (CNN) to make final diagnosis decisions. The COV19-CT-DB dataset, which contains a large number of CT scans with various types of slices and rigorously annotated for COVID-19 detection, was utilized for classification. The proposed pipeline outperformed many other alternatives on the dataset.

Keywords: classification, computed tomography, lung extraction, macro F1 score, UNet segmentation

Procedia PDF Downloads 103