Search results for: analyzing text data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26144

Search results for: analyzing text data

26024 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 63
26023 Recontextualisation of Political Discourse: A Case Study of Translation of News Stories

Authors: Hossein Sabouri

Abstract:

News stories as one of the branches of political discourse has always been regarded a sensitive and challenging area. Political translators often encounter some struggles that are vitally important when it comes to deal with the political tension between the source culture and the target one. Translating news stories is of prime importance since it has widespread availability and power of defining or even changing the facts. News translation is usually more than straight transfer of source text. Like original text endeavoring to manipulate the readers’ minds with imposing their ideologies, translated text seeking to change these ideologies influenced by ideological power. In other words, translation product is not considered more than a recontextualisation of the source text. The present study examines possible criteria for occurring changes in the translation process of news stories based on the ideological and political stance of translator using theories of ‘critical discourse analysis’and ‘translation and power. Fairclough investigates the ideological issues in (political) discourse and Tymoczko studies the political and power-related engagement of the translator in the process of translation. Incorporation of Fairclough and Gentzler and Tymoczko’s theories paves the way for the researcher to looks at the ideological power position of the translator. Data collection and analysis have been accomplished using 17 political-text samples taken from online news agencies which are related to the ‘Iran’s Nuclear Program’. Based on the findings, recontextualisation is mainly observed in terms of the strategies of ‘substitution, omissions, and addition’ in the translation process. The results of the study suggest that there is a significant relationship between the translation of political texts and ideologies of target culture.

Keywords: news translation, recontextualisation, ideological power, political discourse

Procedia PDF Downloads 157
26022 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 107
26021 Poultry in Motion: Text Mining Social Media Data for Avian Influenza Surveillance in the UK

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Background: Avian influenza, more commonly known as Bird flu, is a viral zoonotic respiratory disease stemming from various species of poultry, including pets and migratory birds. Researchers have purported that the accessibility of health information online, in addition to the low-cost data collection methods the internet provides, has revolutionized the methods in which epidemiological and disease surveillance data is utilized. This paper examines the feasibility of using internet data sources, such as Twitter and livestock forums, for the early detection of the avian flu outbreak, through the use of text mining algorithms and social network analysis. Methods: Social media mining was conducted on Twitter between the period of 01/01/2021 to 31/12/2021 via the Twitter API in Python. The results were filtered firstly by hashtags (#avianflu, #birdflu), word occurrences (avian flu, bird flu, H5N1), and then refined further by location to include only those results from within the UK. Analysis was conducted on this text in a time-series manner to determine keyword frequencies and topic modeling to uncover insights in the text prior to a confirmed outbreak. Further analysis was performed by examining clinical signs (e.g., swollen head, blue comb, dullness) within the time series prior to the confirmed avian flu outbreak by the Animal and Plant Health Agency (APHA). Results: The increased search results in Google and avian flu-related tweets showed a correlation in time with the confirmed cases. Topic modeling uncovered clusters of word occurrences relating to livestock biosecurity, disposal of dead birds, and prevention measures. Conclusions: Text mining social media data can prove to be useful in relation to analysing discussed topics for epidemiological surveillance purposes, especially given the lack of applied research in the veterinary domain. The small sample size of tweets for certain weekly time periods makes it difficult to provide statistically plausible results, in addition to a great amount of textual noise in the data.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, avian influenza, social media

Procedia PDF Downloads 67
26020 A Multivariate Exploratory Data Analysis of a Crisis Text Messaging Service in Order to Analyse the Impact of the COVID-19 Pandemic on Mental Health in Ireland

Authors: Hamda Ajmal, Karen Young, Ruth Melia, John Bogue, Mary O'Sullivan, Jim Duggan, Hannah Wood

Abstract:

The Covid-19 pandemic led to a range of public health mitigation strategies in order to suppress the SARS-CoV-2 virus. The drastic changes in everyday life due to lockdowns had the potential for a significant negative impact on public mental health, and a key public health goal is to now assess the evidence from available Irish datasets to provide useful insights on this issue. Text-50808 is an online text-based mental health support service, established in Ireland in 2020, and can provide a measure of revealed distress and mental health concerns across the population. The aim of this study is to explore statistical associations between public mental health in Ireland and the Covid-19 pandemic. Uniquely, this study combines two measures of emotional wellbeing in Ireland: (1) weekly text volume at Text-50808, and (2) emotional wellbeing indicators reported by respondents of the Amárach public opinion survey, carried out on behalf of the Department of Health, Ireland. For this analysis, a multivariate graphical exploratory data analysis (EDA) was performed on the Text-50808 dataset dated from 15th June 2020 to 30th June 2021. This was followed by time-series analysis of key mental health indicators including: (1) the percentage of daily/weekly texts at Text-50808 that mention Covid-19 related issues; (2) the weekly percentage of people experiencing anxiety, boredom, enjoyment, happiness, worry, fear and stress in Amárach survey; and Covid-19 related factors: (3) daily new Covid-19 case numbers; (4) daily stringency index capturing the effect of government non-pharmaceutical interventions (NPIs) in Ireland. The cross-correlation function was applied to measure the relationship between the different time series. EDA of the Text-50808 dataset reveals significant peaks in the volume of texts on days prior to level 3 lockdown and level 5 lockdown in October 2020, and full level 5 lockdown in December 2020. A significantly high positive correlation was observed between the percentage of texts at Text-50808 that reported Covid-19 related issues and the percentage of respondents experiencing anxiety, worry and boredom (at a lag of 1 week) in Amárach survey data. There is a significant negative correlation between percentage of texts with Covid-19 related issues and percentage of respondents experiencing happiness in Amárach survey. Daily percentage of texts at Text-50808 that reported Covid-19 related issues to have a weak positive correlation with daily new Covid-19 cases in Ireland at a lag of 10 days and with daily stringency index of NPIs in Ireland at a lag of 2 days. The sudden peaks in text volume at Text-50808 immediately prior to new restrictions in Ireland indicate an association between a rise in mental health concerns following the announcement of new restrictions. There is also a high correlation between emotional wellbeing variables in the Amárach dataset and the number of weekly texts at Text-50808, and this confirms that Text-50808 reflects overall public sentiment. This analysis confirms the benefits of the texting service as a community surveillance tool for mental health in the population. This initial EDA will be extended to use multivariate modeling to predict the effect of additional Covid-19 related factors on public mental health in Ireland.

Keywords: COVID-19 pandemic, data analysis, digital health, mental health, public health, digital health

Procedia PDF Downloads 108
26019 Machine Learning Automatic Detection on Twitter Cyberbullying

Authors: Raghad A. Altowairgi

Abstract:

With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.

Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost

Procedia PDF Downloads 96
26018 Toward Cloud E-learning System Based on Smart Tools

Authors: Mohsen Maraoui

Abstract:

In the face of the growth in the quantity of data produced, several methods and techniques appear to remedy the problems of processing and analyzing large amounts of information mainly in the field of teaching. In this paper, we propose an intelligent cloud-based teaching system for E-learning content services. This system makes easy the manipulation of various educational content forms, including text, images, videos, 3 dimensions objects and scenes of virtual reality and augmented reality. We discuss the integration of institutional and external services to provide personalized assistance to university members in their daily activities. The proposed system provides an intelligent solution for media services that can be accessed from smart devices cloud-based intelligent service environment with a fully integrated system.

Keywords: cloud computing, e-learning, indexation, IoT, learning in Arabic language, smart tools

Procedia PDF Downloads 103
26017 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Bankole Felix, Tomio Takara

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation, but neither is shown in orthography. In this paper, to proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test, and we achieved an average Mean Opinion Score (MOS) 3.4 (68%), which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: amharic, gemination, Speech synthesis, morphology, epenthesis

Procedia PDF Downloads 55
26016 3D Text Toys: Creative Approach to Experiential and Immersive Learning for World Literacy

Authors: Azyz Sharafy

Abstract:

3D Text Toys is an innovative and creative approach that utilizes 3D text objects to enhance creativity, literacy, and basic learning in an enjoyable and gamified manner. By using 3D Text Toys, children can develop their creativity, visually learn words and texts, and apply their artistic talents within their creative abilities. This process incorporates haptic engagement with 2D and 3D texts, word building, and mechanical construction of everyday objects, thereby facilitating better word and text retention. The concept involves constructing visual objects made entirely out of 3D text/words, where each component of the object represents a word or text element. For instance, a bird can be recreated using words or text shaped like its wings, beak, legs, head, and body, resulting in a 3D representation of the bird purely composed of text. This can serve as an art piece or a learning tool in the form of a 3D text toy. These 3D text objects or toys can be crafted using natural materials such as leaves, twigs, strings, or ropes, or they can be made from various physical materials using traditional crafting tools. Digital versions of these objects can be created using 2D or 3D software on devices like phones, laptops, iPads, or computers. To transform digital designs into physical objects, computerized machines such as CNC routers, laser cutters, and 3D printers can be utilized. Once the parts are printed or cut out, students can assemble the 3D texts by gluing them together, resulting in natural or everyday 3D text objects. These objects can be painted to create artistic pieces or text toys, and the addition of wheels can transform them into moving toys. One of the significant advantages of this visual and creative object-based learning process is that students not only learn words but also derive enjoyment from the process of creating, painting, and playing with these objects. The ownership and creation process further enhances comprehension and word retention. Moreover, for individuals with learning disabilities such as dyslexia, ADD (Attention Deficit Disorder), or other learning difficulties, the visual and haptic approach of 3D Text Toys can serve as an additional creative and personalized learning aid. The application of 3D Text Toys extends to both the English language and any other global written language. The adaptation and creative application may vary depending on the country, space, and native written language. Furthermore, the implementation of this visual and haptic learning tool can be tailored to teach foreign languages based on age level and comprehension requirements. In summary, this creative, haptic, and visual approach has the potential to serve as a global literacy tool.

Keywords: 3D text toys, creative, artistic, visual learning for world literacy

Procedia PDF Downloads 34
26015 Effect of Self-Questioning Strategy on the Improvement of Reading Comprehension of ESL Learners

Authors: Muhammad Hamza

Abstract:

This research is based on the effect of self-questioning strategy on reading comprehension of second language learners at medium level. This research is conducted to find out the effects of self-questioning strategy and how self-questioning strategy helps English learners to improve their reading comprehension. In this research study the researcher has analyzed that how much self-questioning is effective in the field of learning second language and how much it helps second language learners to improve their reading comprehension. For this purpose, the researcher has studied different reading strategies, analyzed, collected data from certificate level class at NUML, Peshawar campus and then found out the effects of self-questioning strategy on reading comprehension of ESL learners. The researcher has randomly selected the participants from certificate class. The data was analyzed through pre-test and post-test and then in the final stage the results of both tests were compared. After the pre-test and post-test, the result of both pre-test and post-test indicated that if the learners start to use self-questioning strategy before reading a text, while reading a text and after reading a particular text there’ll be improvement in comprehension level of ESL learners. The present research has addressed the benefits of self-questioning strategy by taking two tests (pre and post-test).After the result of post-test it is revealed that the use of the self-questioning strategy has a significant effect on the readers’ comprehension thus, they can improve their reading comprehension by using self-questioning strategy.

Keywords: strategy, self-questioning, comprehension, intermediate level ESL learner

Procedia PDF Downloads 36
26014 Motion Effects of Arabic Typography on Screen-Based Media

Authors: Ibrahim Hassan

Abstract:

Motion typography is one of the most important types of visual communication based on display. Through the digital display media, we can control the text properties (size, direction, thickness, color, etc.). The use of motion typography in visual communication made it have several images. We need to adjust the terminology and clarify the different differences between them, so relying on the word motion typography -considered a general term- is not enough to separate the different communicative functions of the moving text. In this paper, we discuss the different effects of motion typography on Arabic writing and how we can achieve harmony between the movement and the letterform, and we will, during our experiments, present a new type of text movement.

Keywords: Arabic typography, motion typography, kinetic typography, fluid typography, temporal typography

Procedia PDF Downloads 121
26013 Recognition of Grocery Products in Images Captured by Cellular Phones

Authors: Farshideh Einsele, Hassan Foroosh

Abstract:

In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.

Keywords: camera-based OCR, feature extraction, document, image processing, grocery products

Procedia PDF Downloads 375
26012 “A Built-In, Shockproof, Shit Detector”: Major Challenges and Peculiarities of Translating Ernest Hemingway’s Short Stories Into Georgian

Authors: Natia Kvachakidze

Abstract:

Translating fiction is a complicated and multidimensional issue. However, studying and analyzing literary translations is not less challenging. This becomes even more complex due to the existence of several alternative translations of one and the same literary work. However, this also makes the research process more interesting at the same time. The aim of the given work is to distinguish major obstacles and challenges translators come across while working on Ernest Hemingway’s short fiction, as well as to analyze certain peculiarities and characteristic features of some existing Georgian translations of the writer’s work (especially in the context of various alternative versions of some well-known short stories). Consequently, the focus is on studying how close these translations come to the form and the context of the original text in order to see if the linguistic and stylistic characteristics of the original author are preserved. Moreover, it is interesting not only to study the relevance of each translation to the original text but also to present a comparative analysis of some major peculiarities of the given translations, which are naturally characterized by certain strengths and weaknesses. The latter is at times inevitable, but in certain cases, there is room for improvement. The given work also attempts to humbly suggest certain ways of possible improvements of some translation inadequacies, as this can provide even more opportunities for deeper and detailed studies in the future.

Keywords: Hemingway, short fiction, translation, Georgian

Procedia PDF Downloads 58
26011 Making Sense of Places: A Comparative Study of Three Contexts in Thailand

Authors: Thirayu Jumsai Na Ayudhya

Abstract:

The study of what architecture means to people in their everyday lives inadequately addresses the contextualized and holistic theoretical framework. This article succinctly presents theoretical framework obtained from the comparative study of how people experience the everyday architecture in three different contexts including 1) Bangkok CBD, 2) Phuket island old-town, and 3) Nan province old-town. The way people make sense of the everyday architecture can be addressed in four super-ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). In this article, these super-ordinate themes were verified whether they recur in three studied-contexts. In each studied-context, the participants were divided into two groups, 1) local people, 2) visitors. Participants were asked to take photographs of the everyday architecture during the everyday routine and to participate the elicit-interview with photographs produced by themselves. Interpretative phenomenological analysis (IPA) was adopted to interpret elicit-interview data. Sub-themes emerging in each studied-context were brought into the cross-comparison among three studied- contexts. It is found that four super-ordinate themes recur with additional distinctive sub-themes. Further studies in other different contexts, such as socio-political, economic, cultural differences, are recommended to complete the theoretical framework.

Keywords: sense of place, the everyday architecture, architectural experience, the everyday

Procedia PDF Downloads 130
26010 High Secure Data Hiding Using Cropping Image and Least Significant Bit Steganography

Authors: Khalid A. Al-Afandy, El-Sayyed El-Rabaie, Osama Salah, Ahmed El-Mhalaway

Abstract:

This paper presents a high secure data hiding technique using image cropping and Least Significant Bit (LSB) steganography. The predefined certain secret coordinate crops will be extracted from the cover image. The secret text message will be divided into sections. These sections quantity is equal the image crops quantity. Each section from the secret text message will embed into an image crop with a secret sequence using LSB technique. The embedding is done using the cover image color channels. Stego image is given by reassembling the image and the stego crops. The results of the technique will be compared to the other state of art techniques. Evaluation is based on visualization to detect any degradation of stego image, the difficulty of extracting the embedded data by any unauthorized viewer, Peak Signal-to-Noise Ratio of stego image (PSNR), and the embedding algorithm CPU time. Experimental results ensure that the proposed technique is more secure compared with the other traditional techniques.

Keywords: steganography, stego, LSB, crop

Procedia PDF Downloads 240
26009 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis

Procedia PDF Downloads 49
26008 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 40
26007 Artificial Intelligence Assisted Sentiment Analysis of Hotel Reviews Using Topic Modeling

Authors: Sushma Ghogale

Abstract:

With a surge in user-generated content or feedback or reviews on the internet, it has become possible and important to know consumers' opinions about products and services. This data is important for both potential customers and businesses providing the services. Data from social media is attracting significant attention and has become the most prominent channel of expressing an unregulated opinion. Prospective customers look for reviews from experienced customers before deciding to buy a product or service. Several websites provide a platform for users to post their feedback for the provider and potential customers. However, the biggest challenge in analyzing such data is in extracting latent features and providing term-level analysis of the data. This paper proposes an approach to use topic modeling to classify the reviews into topics and conduct sentiment analysis to mine the opinions. This approach can analyse and classify latent topics mentioned by reviewers on business sites or review sites, or social media using topic modeling to identify the importance of each topic. It is followed by sentiment analysis to assess the satisfaction level of each topic. This approach provides a classification of hotel reviews using multiple machine learning techniques and comparing different classifiers to mine the opinions of user reviews through sentiment analysis. This experiment concludes that Multinomial Naïve Bayes classifier produces higher accuracy than other classifiers.

Keywords: latent Dirichlet allocation, topic modeling, text classification, sentiment analysis

Procedia PDF Downloads 75
26006 Increasing the Ability of State Senior High School 12 Pekanbaru Students in Writing an Analytical Exposition Text through Comic Strips

Authors: Budiman Budiman

Abstract:

This research aimed at describing and testing whether the students’ ability in writing analytical exposition text is increased by using comic strips at SMAN 12 Pekanbaru. The respondents of this study were the second-grade students, especially XI Science 3 academic year 2011-2012. The total number of students in this class was forty-two (42) students. The quantitative and qualitative data was collected by using writing test and observation sheets. The research finding reveals that there is a significant increase of students’ writing ability in writing analytical exposition text through comic strips. It can be proved by the average score of pre-test was 43.7 and the average score of post-test was 65.37. Besides, the students’ interest and motivation in learning are also improved. These can be seen from the increasing of students’ awareness and activeness in learning process based on observation sheets. The findings draw attention to the use of comic strips in teaching and learning is beneficial for better learning outcome.

Keywords: analytical exposition, comic strips, secondary school students, writing ability

Procedia PDF Downloads 130
26005 A New Method to Reduce 5G Application Layer Payload Size

Authors: Gui Yang Wu, Bo Wang, Xin Wang

Abstract:

Nowadays, 5G service-based interface architecture uses text-based payload like JSON to transfer business data between network functions, which has obvious advantages as internet services but causes unnecessarily larger traffic. In this paper, a new 5G application payload size reduction method is presented to provides the mechanism to negotiate about new capability between network functions when network communication starts up and how 5G application data are reduced according to negotiated information with peer network function. Without losing the advantages of 5G text-based payload, this method demonstrates an excellent result on application payload size reduction and does not increase the usage quota of computing resource. Implementation of this method does not impact any standards or specifications and not change any encoding or decoding functionality too. In a real 5G network, this method will contribute to network efficiency and eventually save considerable computing resources.

Keywords: 5G, JSON, payload size, service-based interface

Procedia PDF Downloads 143
26004 Electronic Physical Activity Record (EPAR): Key for Data Driven Physical Activity Healthcare Services

Authors: Rishi Kanth Saripalle

Abstract:

Medical experts highly recommend to include physical activity in everyone’s daily routine irrespective of gender or age as it helps to improve various medical issues or curb potential issues. Simultaneously, experts are also diligently trying to provide various healthcare services (interventions, plans, exercise routines, etc.) for promoting healthy living and increasing physical activity in one’s ever increasing hectic schedules. With the introduction of wearables, individuals are able to keep track, analyze, and visualize their daily physical activities. However, there seems to be no common agreed standard for representing, gathering, aggregating and analyzing an individual’s physical activity data from disparate multiple sources (exercise pans, multiple wearables, etc.). This issue makes it highly impractical to develop any data-driven physical activity applications and healthcare programs. Further, the inability to integrate the physical activity data into an individual’s Electronic Health Record to provide a wholistic image of that individual’s health is still eluding the experts. This article has identified three primary reasons for this potential issue. First, there is no agreed standard, both structure and semantic, for representing and sharing physical activity data across disparate systems. Second, various organizations (e.g., LA fitness, Gold’s Gym, etc.) and research backed interventions and programs still primarily rely on paper or unstructured format (such as text or notes) to keep track of the data generated from physical activities. Finally, most of the wearable devices operate in silos. This article identifies the underlying problem, explores the idea of reusing existing standards, and identifies the essential modules required to move forward.

Keywords: electronic physical activity record, physical activity in EHR EIM, tracking physical activity data, physical activity data standards

Procedia PDF Downloads 259
26003 The Platform for Digitization of Georgian Documents

Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili

Abstract:

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.

Keywords: NLP, OCR, BERT, Kubernetes, transformers

Procedia PDF Downloads 114
26002 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 141
26001 Examination of Forged Signatures Printed by Means of Fabrication in Terms of Their Relation to the Perpetrator

Authors: Salim Yaren, Nergis Canturk

Abstract:

Signatures are signs that are handwritten by person in order to confirm values such as information, amount, meaning, time and undertaking that bear on a document. It is understood that the signature of a document and the accuracy of the information on the signature is accepted and approved. Forged signatures are formed by forger without knowing and seeing original signature of person that forger will imitate and as a result of his/her effort for hiding typical characteristics of his/her own signatures. Forged signatures are often signed by starting with the initials of the first and last name or persons of the persons whose fake signature will be signed. The similarities in the signatures are completely random. Within the scope of the study, forged signatures are collected from 100 people both their original signatures and forged signatures signed referring to 5 imaginary people. These signatures are compared for 14 signature analyzing criteria by 2 signature analyzing experts except the researcher. 1 numbered analyzing expert who is 9 year experience in his/her field evaluated signatures of 39 (39%) people right and of 25 (25%) people wrong and he /she made any evaluations for signatures of 36 (36%) people. 2 numbered analyzing expert who is 16 year experienced in his/her field evaluated signatures of 49 (49%) people right and 28 (28%) people wrong and he /she made any evaluations for signatures of 23 (23%) people. Forged signatures that are signed by 24 (24%) people are matched by two analyzing experts properly, forged signatures that are signed by 8 (8%) people are matched wrongfully and made up signatures that are signed by 12 (12%) people couldn't be decided by both analyzing experts. Signatures analyzing is a subjective topic so that analyzing and comparisons take form according to education, knowledge and experience of the expert. Consequently, due to the fact that 39% success is achieved by analyzing expert who has 9 year professional experience and 49% success is achieved by analyzing expert who has 16 year professional experience, it is seen that success rate is directly proportionate to knowledge and experience of the expert.

Keywords: forensic signature, forensic signature analysis, signature analysis criteria, forged signature

Procedia PDF Downloads 101
26000 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders

Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh

Abstract:

Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.

Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches

Procedia PDF Downloads 37
25999 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 130
25998 Emotional Analysis for Text Search Queries on Internet

Authors: Gemma García López

Abstract:

The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.

Keywords: emotion classification, text search queries, emotional analysis, sentiment analysis in text, natural language processing

Procedia PDF Downloads 115
25997 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 171
25996 Improved Processing Speed for Text Watermarking Algorithm in Color Images

Authors: Hamza A. Al-Sewadi, Akram N. A. Aldakari

Abstract:

Copyright protection and ownership proof of digital multimedia are achieved nowadays by digital watermarking techniques. A text watermarking algorithm for protecting the property rights and ownership judgment of color images is proposed in this paper. Embedding is achieved by inserting texts elements randomly into the color image as noise. The YIQ image processing model is found to be faster than other image processing methods, and hence, it is adopted for the embedding process. An optional choice of encrypting the text watermark before embedding is also suggested (in case required by some applications), where, the text can is encrypted using any enciphering technique adding more difficulty to hackers. Experiments resulted in embedding speed improvement of more than double the speed of other considered systems (such as least significant bit method, and separate color code methods), and a fairly acceptable level of peak signal to noise ratio (PSNR) with low mean square error values for watermarking purposes.

Keywords: steganography, watermarking, time complexity measurements, private keys

Procedia PDF Downloads 118
25995 Analyzing Conflict Text; ‘Akunyili Memo: State of the Nation’: an Approach from CDA

Authors: Nengi A. H. Ejiobih

Abstract:

Conflict is one of the defining features of human societies. Often, the use or misuse of language in interaction is the genesis of conflict. As such, it is expected that when people use language they do so in socially determined ways and with almost predictable social effects. The objective of this paper was to examine the interest at work as manifested in language choice and collocations in conflict discourse. It also scrutinized the implications of linguistic features in conflict discourse as it concerns ideology and power relations in political discourse in Nigeria. The methodology used for this paper is an approach from Critical discourse analysis because of its multidisciplinary model of analysis, linguistic features and its implications were analysed. The datum used is a text from the Sunday Sun Newspaper in Nigeria, West Africa titled Akunyili Memo: State of the Nation. Some of the findings include; different ideologies are inherent in conflict discourse, there is the presence of power relations being produced, exercised, maintained and produced throughout the discourse and the use of pronouns in conflict discourse is valuable because it is used to initiate and maintain relationships in social context. This paper has provided evidence that, taking into consideration the nature of the social actions and the way these activities are translated into languages, the meanings people convey by their words are identified by their immediate social, political and historical conditions.

Keywords: conflicts, discourse, language, linguistic features, social context

Procedia PDF Downloads 451