Search results for: text preprocessing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1410

Search results for: text preprocessing

990 PaSA: A Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs

Authors: Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres

Abstract:

Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a dataset to train machine learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, and iv) future path to extend this work using Deep Learning and domain-specific pre-trained language models to develop a tool to highlight is provided. This work assists patent practitioners in highlighting semantic information automatically and aids in creating a sustainable and efficient patent analysis using the aptitude of machine learning.

Keywords: machine learning, patents, patent sentiment analysis, patent information retrieval

Procedia PDF Downloads 66
989 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization

Procedia PDF Downloads 164
988 Hybrid Fuzzy Weighted K-Nearest Neighbor to Predict Hospital Readmission for Diabetic Patients

Authors: Soha A. Bahanshal, Byung G. Kim

Abstract:

Identification of patients at high risk for hospital readmission is of crucial importance for quality health care and cost reduction. Predicting hospital readmissions among diabetic patients has been of great interest to many researchers and health decision makers. We build a prediction model to predict hospital readmission for diabetic patients within 30 days of discharge. The core of the prediction model is a modified k Nearest Neighbor called Hybrid Fuzzy Weighted k Nearest Neighbor algorithm. The prediction is performed on a patient dataset which consists of more than 70,000 patients with 50 attributes. We applied data preprocessing using different techniques in order to handle data imbalance and to fuzzify the data to suit the prediction algorithm. The model so far achieved classification accuracy of 80% compared to other models that only use k Nearest Neighbor.

Keywords: machine learning, prediction, classification, hybrid fuzzy weighted k-nearest neighbor, diabetic hospital readmission

Procedia PDF Downloads 160
987 On the Relationship between the Concepts of "[New] Social Democracy" and "Democratic Socialism"

Authors: Gintaras Mitrulevičius

Abstract:

This text, which is based on the conference report, seeks to briefly examine the relationship between the concepts of social democracy and democratic socialism, drawing attention to the essential aspects of its development and, in particular, discussing the contradictions in the relationship between these concepts in the modern period. In the preparation of this text, such research methods as historical, historical-comparative methods were used, as well as methods of analyzing, synthesizing, and generalizing texts. The history of the use of terms in social democracy and democratic socialism shows that these terms were used alternately and almost synonymously. At the end of the 20th century, traditional social democracy was transformed into the so-called "new social democracy." Many of the new social democrats do not consider themselves democratic socialists and avoid the historically characteristic identification of social democracy with democratic socialism. It has become quite popular to believe that social democracy is a separate ideology from democratic socialism. Or that it has become a variant of the ideology of liberalism. This is a testimony to the crisis of ideological self-awareness of social democracy. Since the beginning of the 21st century, social democracy has also experienced a growing crisis of electoral support. This, among other things, led to her slight shift to the left. In this context, some social democrats are once again talking about democratic socialism. The rise of the ideas of democratic socialism in the United States was catalyzed by Bernie Sanders. But the proponents of democratic socialism in the United States have different concepts of democratic socialism. In modern Europe, democratic socialism is also spoken of by leftists of non-social democratic origin, whose understanding is different from that of democratic socialism inherent in classical social democracy. Some political scientists also single out the concepts in question. Analysis of the problem shows that there are currently several concepts of democratic socialism on the spectrum of the political left, both social-democratic and non-social-democratic.

Keywords: democratic socializm, socializm, social democracy, new social democracy, political ideologies

Procedia PDF Downloads 91
986 Examining Reading Comprehension Skills Based on Different Reading Comprehension Frameworks and Taxonomies

Authors: Seval Kula-Kartal

Abstract:

Developing students’ reading comprehension skills is an aim that is difficult to accomplish and requires to follow long-term and systematic teaching and assessment processes. In these processes, teachers need tools to provide guidance to them on what reading comprehension is and which comprehension skills they should develop. Due to a lack of clear and evidence-based frameworks defining reading comprehension skills, especially in Turkiye, teachers and students mostly follow various processes in the classrooms without having an idea about what their comprehension goals are and what those goals mean. Since teachers and students do not have a clear view of comprehension targets, strengths, and weaknesses in students’ comprehension skills, the formative feedback processes cannot be managed in an effective way. It is believed that detecting and defining influential comprehension skills may provide guidance both to teachers and students during the feedback process. Therefore, in the current study, some of the reading comprehension frameworks that define comprehension skills operationally were examined. The aim of the study is to develop a simple and clear framework that can be used by teachers and students during their teaching, learning, assessment, and feedback processes. The current study is qualitative research in which documents related to reading comprehension skills were analyzed. Therefore, the study group consisted of recourses and frameworks which made big contributions to theoretical and operational definitions of reading comprehension. A content analysis was conducted on the resources included in the study group. To determine the validity of the themes and sub-categories revealed as the result of content analysis, three educational assessment experts were asked to examine the content analysis results. The Fleiss’ Cappa coefficient revealed that there is consistency among themes and categories defined by three different experts. The content analysis of the reading comprehension frameworks revealed that comprehension skills could be examined under four different themes. The first and second themes focus on understanding information given explicitly or implicitly within a text. The third theme includes skills used by the readers to make connections between their personal knowledge and the information given in the text. Lastly, the fourth theme focus on skills used by readers to examine the text with a critical view. The results suggested that fundamental reading comprehension skills can be examined under four themes. Teachers are recommended to use these themes in their reading comprehension teaching and assessment processes. Acknowledgment: This research is supported by Pamukkale University Scientific Research Unit within the project, whose title is Developing A Reading Comprehension Rubric.

Keywords: reading comprehension, assessing reading comprehension, comprehension taxonomies, educational assessment

Procedia PDF Downloads 60
985 Translation as a Cultural Medium: Understanding the Mauritian Culture and History through an English Translation

Authors: Pooja Booluck

Abstract:

This project seeks to translate a chapter in Le Silence des Chagos by Shenaz Patel a Mauritian author whose work has never been translated before. The chapter discusses the attempt of the protagonist to return to her home country Diego Garcia after her deportation. The English translation will offer an historical account to the target audience of the deportation of Chagossians to Mauritius during the 1970s. The target audience comprises of English-speaking translation scholars translation students and African literature scholars. In light of making the cultural elements of Mauritian culture accessible the translation will maintain the cultural items such as food and oral discourses in Creole so as to preserve the authenticity of the source culture. In order to better comprehend the cultural elements mentioned the target reader will be provided with detailed footnotes explaining the cultural and historical references. This translation will also address the importance of folkloric songs in Mauritius and its intergenerational function in Mauritian communities which will also remain in Creole. While such an approach will help to preserve the meaning of the source text the borrowing technique and the foreignizing method will be employed which will in turn help the reader in becoming more familiar with the Mauritian community. Translating a text from French to English while maintaining certain words or discourses in a minority language such as Creole bears certain challenges: How does the translator ensure the comprehensibility of the reader? Are there any translation losses? What are the choices of the translator?

Keywords: Chagos archipelagos in Exile, English translation, Le Silence des Chagos, Mauritian culture and history

Procedia PDF Downloads 296
984 Performance Evaluation of Various Segmentation Techniques on MRI of Brain Tissue

Authors: U.V. Suryawanshi, S.S. Chowhan, U.V Kulkarni

Abstract:

Accuracy of segmentation methods is of great importance in brain image analysis. Tissue classification in Magnetic Resonance brain images (MRI) is an important issue in the analysis of several brain dementias. This paper portraits performance of segmentation techniques that are used on Brain MRI. A large variety of algorithms for segmentation of Brain MRI has been developed. The objective of this paper is to perform a segmentation process on MR images of the human brain, using Fuzzy c-means (FCM), Kernel based Fuzzy c-means clustering (KFCM), Spatial Fuzzy c-means (SFCM) and Improved Fuzzy c-means (IFCM). The review covers imaging modalities, MRI and methods for noise reduction and segmentation approaches. All methods are applied on MRI brain images which are degraded by salt-pepper noise demonstrate that the IFCM algorithm performs more robust to noise than the standard FCM algorithm. We conclude with a discussion on the trend of future research in brain segmentation and changing norms in IFCM for better results.

Keywords: image segmentation, preprocessing, MRI, FCM, KFCM, SFCM, IFCM

Procedia PDF Downloads 300
983 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 247
982 Cognitive Translation and Conceptual Wine Tasting Metaphors: A Corpus-Based Research

Authors: Christine Demaecker

Abstract:

Many researchers have underlined the importance of metaphors in specialised language. Their use of specific domains helps us understand the conceptualisations used to communicate new ideas or difficult topics. Within the wide area of specialised discourse, wine tasting is a very specific example because it is almost exclusively metaphoric. Wine tasting metaphors express various conceptualisations. They are not linguistic but rather conceptual, as defined by Lakoff & Johnson. They correspond to the linguistic expression of a mental projection from a well-known or more concrete source domain onto the target domain, which is the taste of wine. But unlike most specialised terminologies, the vocabulary is never clearly defined. When metaphorical terms are listed in dictionaries, their definitions remain vague, unclear, and circular. They cannot be replaced by literal linguistic expressions. This makes it impossible to transfer them into another language with the traditional linguistic translation methods. Qualitative research investigates whether wine tasting metaphors could rather be translated with the cognitive translation process, as well described by Nili Mandelblit (1995). The research is based on a corpus compiled from two high-profile wine guides; the Parker’s Wine Buyer’s Guide and its translation into French and the Guide Hachette des Vins and its translation into English. In this small corpus with a total of 68,826 words, 170 metaphoric expressions have been identified in the original English text and 180 in the original French text. They have been selected with the MIPVU Metaphor Identification Procedure developed at the Vrije Universiteit Amsterdam. The selection demonstrates that both languages use the same set of conceptualisations, which are often combined in wine tasting notes, creating conceptual integrations or blends. The comparison of expressions in the source and target texts also demonstrates the use of the cognitive translation approach. In accordance with the principle of relevance, the translation always uses target language conceptualisations, but compared to the original, the highlighting of the projection is often different. Also, when original metaphors are complex with a combination of conceptualisations, at least one element of the original metaphor underlies the target expression. This approach perfectly integrates into Lederer’s interpretative model of translation (2006). In this triangular model, the transfer of conceptualisation could be included at the level of ‘deverbalisation/reverbalisation’, the crucial stage of the model, where the extraction of meaning combines with the encyclopedic background to generate the target text.

Keywords: cognitive translation, conceptual integration, conceptual metaphor, interpretative model of translation, wine tasting metaphor

Procedia PDF Downloads 107
981 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower, and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software we used in our study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: preprocessing of the data used, feature detection, and classification. We tried to determine the success of our study with different accuracy metrics and the results. We presented it comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: decision tree, water quality, water pollution, machine learning

Procedia PDF Downloads 63
980 The Effect of Metacognitive Think-Aloud Strategy on Form 1 Pupils’ Reading Comprehension Skills via DELIMa Platform

Authors: Fatin Khairani Khairul 'Azam

Abstract:

Reading comprehension requires the formation of an articulate mental representation of the information in a text. It involves three interdepended elements—the reader, the text, and the activity, all situated into an extensive sociocultural context. Incorporating metacognitive think-aloud strategy into teaching reading comprehension would improve learners’ reading comprehension skills as it helps to monitor their thinking as they read. Furthermore, by integrating Digital Educational Learning Initiative Malaysia (DELIMa) platform in teaching reading comprehension, it can make the process interactive and fun. A quasi-experimental one-group pre-test post-test design was used to identify the effectiveness of using metacognitive think-aloud strategy via DELIMa platform in improving pupils’ reading comprehension performance and their perceptions towards reading comprehension. The participants of the study comprised 82 of form 1 pupils from a secondary school in Pasir Gudang, Johor, Malaysia. All participants were required to sit for pre-and post-tests to track their reading comprehension performance and perceptions. The findings revealed that incorporating metacognitive think-aloud strategy is an effective strategy in teaching reading comprehension as the performance of pupils in reading comprehension and their perceptions towards reading comprehension were improved during the post tests. It is hoped that the findings of the study would be useful to the teachers incorporating the same strategy in teaching to improve pupils' reading skills. It is suggested that future study should involve the motivation factor of the participants on incorporating think-aloud strategy into teaching reading comprehension as well.

Keywords: DELIMa Platform, ESL Learners, Metacognitive Strategy, Pupils' Perceptions, Reading Comprehension, Think-Aloud Strategy

Procedia PDF Downloads 178
979 Evaluation of Diagnosis Performance Based on Pairwise Model Construction and Filtered Data

Authors: Hyun-Woo Cho

Abstract:

It is quite important to utilize right time and intelligent production monitoring and diagnosis of industrial processes in terms of quality and safety issues. When compared with monitoring task, fault diagnosis represents the task of finding process variables responsible causing a specific fault in the process. It can be helpful to process operators who should investigate and eliminate root causes more effectively and efficiently. This work focused on the active use of combining a nonlinear statistical technique with a preprocessing method in order to implement practical real-time fault identification schemes for data-rich cases. To compare its performance to existing identification schemes, a case study on a benchmark process was performed in several scenarios. The results showed that the proposed fault identification scheme produced more reliable diagnosis results than linear methods. In addition, the use of the filtering step improved the identification results for the complicated processes with massive data sets.

Keywords: diagnosis, filtering, nonlinear statistical techniques, process monitoring

Procedia PDF Downloads 217
978 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 245
977 Sparse Coding Based Classification of Electrocardiography Signals Using Data-Driven Complete Dictionary Learning

Authors: Fuad Noman, Sh-Hussain Salleh, Chee-Ming Ting, Hadri Hussain, Syed Rasul

Abstract:

In this paper, a data-driven dictionary approach is proposed for the automatic detection and classification of cardiovascular abnormalities. Electrocardiography (ECG) signal is represented by the trained complete dictionaries that contain prototypes or atoms to avoid the limitations of pre-defined dictionaries. The data-driven trained dictionaries simply take the ECG signal as input rather than extracting features to study the set of parameters that yield the most descriptive dictionary. The approach inherently learns the complicated morphological changes in ECG waveform, which is then used to improve the classification. The classification performance was evaluated with ECG data under two different preprocessing environments. In the first category, QT-database is baseline drift corrected with notch filter and it filters the 60 Hz power line noise. In the second category, the data are further filtered using fast moving average smoother. The experimental results on QT database confirm that our proposed algorithm shows a classification accuracy of 92%.

Keywords: electrocardiogram, dictionary learning, sparse coding, classification

Procedia PDF Downloads 354
976 Analyzing On-Line Process Data for Industrial Production Quality Control

Authors: Hyun-Woo Cho

Abstract:

The monitoring of industrial production quality has to be implemented to alarm early warning for unusual operating conditions. Furthermore, identification of their assignable causes is necessary for a quality control purpose. For such tasks many multivariate statistical techniques have been applied and shown to be quite effective tools. This work presents a process data-based monitoring scheme for production processes. For more reliable results some additional steps of noise filtering and preprocessing are considered. It may lead to enhanced performance by eliminating unwanted variation of the data. The performance evaluation is executed using data sets from test processes. The proposed method is shown to provide reliable quality control results, and thus is more effective in quality monitoring in the example. For practical implementation of the method, an on-line data system must be available to gather historical and on-line data. Recently large amounts of data are collected on-line in most processes and implementation of the current scheme is feasible and does not give additional burdens to users.

Keywords: detection, filtering, monitoring, process data

Procedia PDF Downloads 525
975 Sentiment Analysis of Chinese Microblog Comments: Comparison between Support Vector Machine and Long Short-Term Memory

Authors: Xu Jiaqiao

Abstract:

Text sentiment analysis is an important branch of natural language processing. This technology is widely used in public opinion analysis and web surfing recommendations. At present, the mainstream sentiment analysis methods include three parts: sentiment analysis based on a sentiment dictionary, based on traditional machine learning, and based on deep learning. This paper mainly analyzes and compares the advantages and disadvantages of the SVM method of traditional machine learning and the Long Short-term Memory (LSTM) method of deep learning in the field of Chinese sentiment analysis, using Chinese comments on Sina Microblog as the data set. Firstly, this paper classifies and adds labels to the original comment dataset obtained by the web crawler, and then uses Jieba word segmentation to classify the original dataset and remove stop words. After that, this paper extracts text feature vectors and builds document word vectors to facilitate the training of the model. Finally, SVM and LSTM models are trained respectively. After accuracy calculation, it can be obtained that the accuracy of the LSTM model is 85.80%, while the accuracy of SVM is 91.07%. But at the same time, LSTM operation only needs 2.57 seconds, SVM model needs 6.06 seconds. Therefore, this paper concludes that: compared with the SVM model, the LSTM model is worse in accuracy but faster in processing speed.

Keywords: sentiment analysis, support vector machine, long short-term memory, Chinese microblog comments

Procedia PDF Downloads 62
974 The Translation of Code-Switching in African Literature: Comparing the Two German Translations of Ngugi Wa Thiongo’s "Petals of Blood"

Authors: Omotayo Olalere

Abstract:

The relevance of code-switching for intercultural communication through literary translation cannot be overemphasized. The translation of code-switching and its implications for translations studies have been studied in the context of African literature. In these cases, code-switching was examined in the more general terms of its usage in source text and not particularly in Ngugi’s novels and its translations. In addition, the functions of translation and code-switching in the lyrics of some popular African songs have been studied, but this study is related more with oral performance than with written literature. As such, little has been done on the German translation of code-switching in African works. This study intends to fill this lacuna by examining the concept of code-switching in the German translations in Ngugi’s Petals of Blood. The aim is to highlight the significance of code-switching as a phenomenon in this African (Ngugi’s) novel written in English and to also focus on its representation in the two German translations. The target texts to be used are Verbrannte Blueten and Land der flammenden Blueten. “Abrogration“ as a concept will play an important role in the analysis of the data. Findings will show that the ideology of a translator plays a huge role in representing the concept of “abrogration” in the translation of code-switching in the selected source text. The study will contribute to knowledge in translation studies by bringing to limelight the need to foreground aspects of language contact in translation theory and practice, particularly in the African context. Relevant translation theories adopted for the study include Bandia’s (2008) postcolonial theory of translation and Snell-Hornby”s (1988) cultural translation theory.

Keywords: code switching, german translation, ngugi wa thiong’o, petals of blood

Procedia PDF Downloads 54
973 Predicting Personality and Psychological Distress Using Natural Language Processing

Authors: Jihee Jang, Seowon Yoon, Gaeun Son, Minjung Kang, Joon Yeon Choeh, Kee-Hong Choi

Abstract:

Background: Self-report multiple choice questionnaires have been widely utilized to quantitatively measure one’s personality and psychological constructs. Despite several strengths (e.g., brevity and utility), self-report multiple-choice questionnaires have considerable limitations in nature. With the rise of machine learning (ML) and Natural language processing (NLP), researchers in the field of psychology are widely adopting NLP to assess psychological constructs to predict human behaviors. However, there is a lack of connections between the work being performed in computer science and that psychology due to small data sets and unvalidated modeling practices. Aims: The current article introduces the study method and procedure of phase II, which includes the interview questions for the five-factor model (FFM) of personality developed in phase I. This study aims to develop the interview (semi-structured) and open-ended questions for the FFM-based personality assessments, specifically designed with experts in the field of clinical and personality psychology (phase 1), and to collect the personality-related text data using the interview questions and self-report measures on personality and psychological distress (phase 2). The purpose of the study includes examining the relationship between natural language data obtained from the interview questions, measuring the FFM personality constructs, and psychological distress to demonstrate the validity of the natural language-based personality prediction. Methods: The phase I (pilot) study was conducted on fifty-nine native Korean adults to acquire the personality-related text data from the interview (semi-structured) and open-ended questions based on the FFM of personality. The interview questions were revised and finalized with the feedback from the external expert committee, consisting of personality and clinical psychologists. Based on the established interview questions, a total of 425 Korean adults were recruited using a convenience sampling method via an online survey. The text data collected from interviews were analyzed using natural language processing. The results of the online survey, including demographic data, depression, anxiety, and personality inventories, were analyzed together in the model to predict individuals’ FFM of personality and the level of psychological distress (phase 2).

Keywords: personality prediction, psychological distress prediction, natural language processing, machine learning, the five-factor model of personality

Procedia PDF Downloads 58
972 Detection and Classification of Rubber Tree Leaf Diseases Using Machine Learning

Authors: Kavyadevi N., Kaviya G., Gowsalya P., Janani M., Mohanraj S.

Abstract:

Hevea brasiliensis, also known as the rubber tree, is one of the foremost assets of crops in the world. One of the most significant advantages of the Rubber Plant in terms of air oxygenation is its capacity to reduce the likelihood of an individual developing respiratory allergies like asthma. To construct such a system that can properly identify crop diseases and pests and then create a database of insecticides for each pest and disease, we must first give treatment for the illness that has been detected. We shall primarily examine three major leaf diseases since they are economically deficient in this article, which is Bird's eye spot, algal spot and powdery mildew. And the recommended work focuses on disease identification on rubber tree leaves. It will be accomplished by employing one of the superior algorithms. Input, Preprocessing, Image Segmentation, Extraction Feature, and Classification will be followed by the processing technique. We will use time-consuming procedures that they use to detect the sickness. As a consequence, the main ailments, underlying causes, and signs and symptoms of diseases that harm the rubber tree are covered in this study.

Keywords: image processing, python, convolution neural network (CNN), machine learning

Procedia PDF Downloads 52
971 Transferring Cultural Meanings: A Case of Translation Classroom

Authors: Ramune Kasperaviciene, Jurgita Motiejuniene, Dalia Venckiene

Abstract:

Familiarising students with strategies for transferring cultural meanings (intertextual units, culture-specific idioms, culture-specific items, etc.) should be part of a comprehensive translator training programme. The present paper focuses on strategies for transferring such meanings into other languages and explores possibilities for introducing these methods and practice to translation students. The authors (university translation teachers) analyse the means of transferring cultural meanings from English into Lithuanian in a specific travel book, attribute these means to theoretically grounded strategies, and make calculations related to the frequency of adoption of specific strategies; translation students are familiarised with concepts and methods related to transferring cultural meanings and asked to put their theoretical knowledge into practice, i.e. interpret and translate certain culture-specific items from the same source text, and ground their decisions on theory; the comparison of the strategies employed by the professional translator of the source text (as identified by the authors of this study) and by the students is made. As a result, both students and teachers gain valuable experience, and new practices of conducting translation classes for a specific purpose evolve. Conclusions highlight the differences and similarities of non-professional and professional choices, summarise the possibilities for introducing methods of transferring cultural meanings to students, and round up with specific considerations of the impact of theoretical knowledge and the degree of experience on decisions made in the translation process.

Keywords: cultural meanings, culture-specific items, strategies for transferring cultural meanings, translator training

Procedia PDF Downloads 319
970 A Clustering-Based Approach for Weblog Data Cleaning

Authors: Amine Ganibardi, Cherif Arab Ali

Abstract:

This paper addresses the data cleaning issue as a part of web usage data preprocessing within the scope of Web Usage Mining. Weblog data recorded by web servers within log files reflect usage activity, i.e., End-users’ clicks and underlying user-agents’ hits. As Web Usage Mining is interested in End-users’ behavior, user-agents’ hits are referred to as noise to be cleaned-off before mining. Filtering hits from clicks is not trivial for two reasons, i.e., a server records requests interlaced in sequential order regardless of their source or type, website resources may be set up as requestable interchangeably by end-users and user-agents. The current methods are content-centric based on filtering heuristics of relevant/irrelevant items in terms of some cleaning attributes, i.e., website’s resources filetype extensions, website’s resources pointed by hyperlinks/URIs, http methods, user-agents, etc. These methods need exhaustive extra-weblog data and prior knowledge on the relevant and/or irrelevant items to be assumed as clicks or hits within the filtering heuristics. Such methods are not appropriate for dynamic/responsive Web for three reasons, i.e., resources may be set up to as clickable by end-users regardless of their type, website’s resources are indexed by frame names without filetype extensions, web contents are generated and cancelled differently from an end-user to another. In order to overcome these constraints, a clustering-based cleaning method centered on the logging structure is proposed. This method focuses on the statistical properties of the logging structure at the requested and referring resources attributes levels. It is insensitive to logging content and does not need extra-weblog data. The used statistical property takes on the structure of the generated logging feature by webpage requests in terms of clicks and hits. Since a webpage consists of its single URI and several components, these feature results in a single click to multiple hits ratio in terms of the requested and referring resources. Thus, the clustering-based method is meant to identify two clusters based on the application of the appropriate distance to the frequency matrix of the requested and referring resources levels. As the ratio clicks to hits is single to multiple, the clicks’ cluster is the smallest one in requests number. Hierarchical Agglomerative Clustering based on a pairwise distance (Gower) and average linkage has been applied to four logfiles of dynamic/responsive websites whose click to hits ratio range from 1/2 to 1/15. The optimal clustering set on the basis of average linkage and maximum inter-cluster inertia results always in two clusters. The evaluation of the smallest cluster referred to as clicks cluster under the terms of confusion matrix indicators results in 97% of true positive rate. The content-centric cleaning methods, i.e., conventional and advanced cleaning, resulted in a lower rate 91%. Thus, the proposed clustering-based cleaning outperforms the content-centric methods within dynamic and responsive web design without the need of any extra-weblog. Such an improvement in cleaning quality is likely to refine dependent analysis.

Keywords: clustering approach, data cleaning, data preprocessing, weblog data, web usage data

Procedia PDF Downloads 155
969 Reading Strategies of Generation X and Y: A Survey on Learners' Skills and Preferences

Authors: Kateriina Rannula, Elle Sõrmus, Siret Piirsalu

Abstract:

Mixed generation classroom is a phenomenon that current higher education establishments are faced with daily trying to meet the needs of modern labor market with its emphasis on lifelong learning and retraining. Representatives of mainly X and Y generations in one classroom acquiring higher education is a challenge to lecturers considering all the characteristics that differ one generation from another. The importance of outlining different strategies and considering the needs of the students lies in the necessity for everyone to acquire the maximum of the provided knowledge as well as to understand each other to study together in one classroom and successfully cooperate in future workplaces. In addition to different generations, there are also learners with different native languages which have an impact on reading and understanding texts in third languages, including possible translation. Current research aims to investigate, describe and compare reading strategies among the representatives of generation X and Y. Hypotheses were formulated - representatives of generation X and Y use different reading strategies which is also different among first and third year students of the before mentioned generations. Current study is an empirical, qualitative study. To achieve the aim of the research, relevant literature was analyzed and a semi-structured questionnaire conducted among the first and third year students of Tallinn Health Care College. Questionnaire consisted of 25 statements on the text reading strategies, 3 multiple choice questions on preferences considering the design and medium of the text, and three open questions on the translation process when working with a text in student’s third language. The results of the questionnaire were categorized, analyzed and compared. Both, generation X and Y described their reading strategies to be 'scanning' and 'surfing'. Compared to generation X, first year generation Y learners valued interactivity and nonlinear texts. Students frequently used strategies of skimming, scanning, translating and highlighting together with relevant-thinking and assistance-seeking. Meanwhile, the third-year generation Y students no longer frequently used translating, resourcing and highlighting while Generation X learners still incorporated these strategies. Knowing about different needs of the generations currently inside the classrooms and on the labor market enables us with tools to provide sustainable education and grants the society a work force that is more flexible and able to move between professions. Future research should be conducted in order to investigate the amount of learning and strategy- adoption between generations. As for reading, main suggestions arising from the research are as follows: make a variety of materials available to students; allow them to select what they want to read and try to make those materials visually attractive, relevant, and appropriately challenging for learners considering the differences of generations.

Keywords: generation X, generation Y, learning strategies, reading strategies

Procedia PDF Downloads 161
968 Continuous FAQ Updating for Service Incident Ticket Resolution

Authors: Kohtaroh Miyamoto

Abstract:

As enterprise computing becomes more and more complex, the costs and technical challenges of IT system maintenance and support are increasing rapidly. One popular approach to managing IT system maintenance is to prepare and use an FAQ (Frequently Asked Questions) system to manage and reuse systems knowledge. Such an FAQ system can help reduce the resolution time for each service incident ticket. However, there is a major problem where over time the knowledge in such FAQs tends to become outdated. Much of the knowledge captured in the FAQ requires periodic updates in response to new insights or new trends in the problems addressed in order to maintain its usefulness for problem resolution. These updates require a systematic approach to define the exact portion of the FAQ and its content. Therefore, we are working on a novel method to hierarchically structure the FAQ and automate the updates of its structure and content. We use structured information and the unstructured text information with the timelines of the information in the service incident tickets. We cluster the tickets by structured category information, by keywords, and by keyword modifiers for the unstructured text information. We also calculate an urgency score based on trends, resolution times, and priorities. We carefully studied the tickets of one of our projects over a 2.5-year time period. After the first 6 months, we started to create FAQs and confirmed they improved the resolution times. We continued observing over the next 2 years to assess the ongoing effectiveness of our method for the automatic FAQ updates. We improved the ratio of tickets covered by the FAQ from 32.3% to 68.9% during this time. Also, the average time reduction of ticket resolution was between 31.6% and 43.9%. Subjective analysis showed more than 75% reported that the FAQ system was useful in reducing ticket resolution times.

Keywords: FAQ system, resolution time, service incident tickets, IT system maintenance

Procedia PDF Downloads 312
967 An Inviscid Compressible Flow Solver Based on Unstructured OpenFOAM Mesh Format

Authors: Utkan Caliskan

Abstract:

Two types of numerical codes based on finite volume method are developed in order to solve compressible Euler equations to simulate the flow through forward facing step channel. Both algorithms have AUSM+- up (Advection Upstream Splitting Method) scheme for flux splitting and two-stage Runge-Kutta scheme for time stepping. In this study, the flux calculations differentiate between the algorithm based on OpenFOAM mesh format which is called 'face-based' algorithm and the basic algorithm which is called 'element-based' algorithm. The face-based algorithm avoids redundant flux computations and also is more flexible with hybrid grids. Moreover, some of OpenFOAM’s preprocessing utilities can be used on the mesh. Parallelization of the face based algorithm for which atomic operations are needed due to the shared memory model, is also presented. For several mesh sizes, 2.13x speed up is obtained with face-based approach over the element-based approach.

Keywords: cell centered finite volume method, compressible Euler equations, OpenFOAM mesh format, OpenMP

Procedia PDF Downloads 292
966 Understanding Factors that Affect the Prior Knowledge of Deaf and Hard of Hearing Students and their Relation to Reading Comprehension

Authors: Khalid Alasim

Abstract:

The reading comprehension levels of students who are deaf or hard of hearing (DHH) are low compared to those of their hearing peers. One possible reason for this low reading levels is related to the students’ prior knowledge. This study investigated the potential factors that might affected DHH students’ prior knowledge, including their degree of hearing loss, the presence or absence of family members with a hearing loss, and educational stage (elementary–middle school). The study also examined the contribution of prior knowledge in predicting DHH students’ reading comprehension levels, and investigated the differences in the students’ scores based on the type of questions, including text-explicit (TE), text-implicit (TI), and script-implicit (SI) questions. Thirty-one elementary and middle-school students completed a demographic form and assessment, and descriptive statistics and multiple and simple linear regressions were used to answer the research questions. The findings indicated that the independent variables—degree of hearing loss, presence or absence of family members with hearing loss, and educational stage—explained little of the variance in DHH students’ prior knowledge. Further, the results showed that the DHH students’ prior knowledge affected their reading comprehension. Finally, the result demonstrated that the participants were able to answer more of the TI questions correctly than the TE and SI questions. The study concluded that prior knowledge is important in these students’ reading comprehension, and it is also important for teachers and parents of DHH children to use effective ways to increase their students’ and children’s prior knowledge.

Keywords: reading comprehension, prior knowledge, metacognition, elementary, self-contained classrooms

Procedia PDF Downloads 78
965 Wideband Performance Analysis of C-FDTD Based Algorithms in the Discretization Impoverishment of a Curved Surface

Authors: Lucas L. L. Fortes, Sandro T. M. Gonçalves

Abstract:

In this work, it is analyzed the wideband performance with the mesh discretization impoverishment of the Conformal Finite Difference Time-Domain (C-FDTD) approaches developed by Raj Mittra, Supriyo Dey and Wenhua Yu for the Finite Difference Time-Domain (FDTD) method. These approaches are a simple and efficient way to optimize the scattering simulation of curved surfaces for Dielectric and Perfect Electric Conducting (PEC) structures in the FDTD method, since curved surfaces require dense meshes to reduce the error introduced due to the surface staircasing. Defined, on this work, as D-FDTD-Diel and D-FDTD-PEC, these approaches are well-known in the literature, but the improvement upon their application is not quantified broadly regarding wide frequency bands and poorly discretized meshes. Both approaches bring improvement of the accuracy of the simulation without requiring dense meshes, also making it possible to explore poorly discretized meshes which bring a reduction in simulation time and the computational expense while retaining a desired accuracy. However, their applications present limitations regarding the mesh impoverishment and the frequency range desired. Therefore, the goal of this work is to explore the approaches regarding both the wideband and mesh impoverishment performance to bring a wider insight over these aspects in FDTD applications. The D-FDTD-Diel approach consists in modifying the electric field update in the cells intersected by the dielectric surface, taking into account the amount of dielectric material within the mesh cells edges. By taking into account the intersections, the D-FDTD-Diel provides accuracy improvement at the cost of computational preprocessing, which is a fair trade-off, since the update modification is quite simple. Likewise, the D-FDTD-PEC approach consists in modifying the magnetic field update, taking into account the PEC curved surface intersections within the mesh cells and, considering a PEC structure in vacuum, the air portion that fills the intersected cells when updating the magnetic fields values. Also likewise to D-FDTD-Diel, the D-FDTD-PEC provides a better accuracy at the cost of computational preprocessing, although with a drawback of having to meet stability criterion requirements. The algorithms are formulated and applied to a PEC and a dielectric spherical scattering surface with meshes presenting different levels of discretization, with Polytetrafluoroethylene (PTFE) as the dielectric, being a very common material in coaxial cables and connectors for radiofrequency (RF) and wideband application. The accuracy of the algorithms is quantified, showing the approaches wideband performance drop along with the mesh impoverishment. The benefits in computational efficiency, simulation time and accuracy are also shown and discussed, according to the frequency range desired, showing that poorly discretized mesh FDTD simulations can be exploited more efficiently, retaining the desired accuracy. The results obtained provided a broader insight over the limitations in the application of the C-FDTD approaches in poorly discretized and wide frequency band simulations for Dielectric and PEC curved surfaces, which are not clearly defined or detailed in the literature and are, therefore, a novelty. These approaches are also expected to be applied in the modeling of curved RF components for wideband and high-speed communication devices in future works.

Keywords: accuracy, computational efficiency, finite difference time-domain, mesh impoverishment

Procedia PDF Downloads 105
964 Between a Rock and a Hard Place: The Possible Roles of Eternity Clauses in the Member States of the European Union

Authors: Zsuzsa Szakaly

Abstract:

Several constitutions have explicit or implicit eternity clauses in the European Union, their classic roles were analyzed so far, albeit there are new possibilities emerging in relation to the identity of the constitutions of the Member States. The aim of the study is to look at the practice of the Constitutional Courts of the Member States in detail regarding eternity clauses where limiting constitutional amendment has practical bearing, and to examine the influence of such practice on Europeanization. There are some states that apply explicit eternity clauses embedded in the text of the constitution, e.g., Italy, Germany, and Romania. In other states, the Constitutional Court 'unearthed' the implicit eternity clauses from the text of the basic law, e.g., Slovakia and Croatia. By using comparative analysis to examine the explicit or implicit clauses of the concerned constitutions, taking into consideration the new trends of the judicial opinions of the Member States and the fresh scientific studies, the main questions are: How to wield the double-edged sword of eternity clauses? To support European Integration or to support the sovereignty of the Member State? To help Europeanization or to act against it? Eternity clauses can easily find themselves between a rock and a hard place, the law of the European Union and the law of a Member State, with more possible interpretations. As more and more Constitutional Courts started to declare elements of their Member States’ constitutional identities, these began to interfere with the eternity clauses. Will this trend eventually work against Europeanization? As a result of the research, it can be stated that a lowest common denominator exists in the practice of European Constitutional Courts regarding eternity clauses. The chance of a European model and the possibility of this model influencing the status quo between the European Union and the Member States will be examined by looking at the answers these courts have found so far.

Keywords: constitutional court, constitutional identity, eternity clause, European Integration

Procedia PDF Downloads 119
963 A Generative Pretrained Transformer-Based Question-Answer Chatbot and Phantom-Less Quantitative Computed Tomography Bone Mineral Density Measurement System for Osteoporosis

Authors: Mian Huang, Chi Ma, Junyu Lin, William Lu

Abstract:

Introduction: Bone health attracts more attention recently and an intelligent question and answer (QA) chatbot for osteoporosis is helpful for science popularization. With Generative Pretrained Transformer (GPT) technology developing, we build an osteoporosis corpus dataset and then fine-tune LLaMA, a famous open-source GPT foundation large language model(LLM), on our self-constructed osteoporosis corpus. Evaluated by clinical orthopedic experts, our fine-tuned model outperforms vanilla LLaMA on osteoporosis QA task in Chinese. Three-dimensional quantitative computed tomography (QCT) measured bone mineral density (BMD) is considered as more accurate than DXA for BMD measurement in recent years. We develop an automatic Phantom-less QCT(PL-QCT) that is more efficient for BMD measurement since no need of an external phantom for calibration. Combined with LLM on osteoporosis, our PL-QCT provides efficient and accurate BMD measurement for our chatbot users. Material and Methods: We build an osteoporosis corpus containing about 30,000 Chinese literatures whose titles are related to osteoporosis. The whole process is done automatically, including crawling literatures in .pdf format, localizing text/figure/table region by layout segmentation algorithm and recognizing text by OCR algorithm. We train our model by continuous pre-training with Low-rank Adaptation (LoRA, rank=10) technology to adapt LLaMA-7B model to osteoporosis domain, whose basic principle is to mask the next word in the text and make the model predict that word. The loss function is defined as cross-entropy between the predicted and ground-truth word. Experiment is implemented on single NVIDIA A800 GPU for 15 days. Our automatic PL-QCT BMD measurement adopt AI-associated region-of-interest (ROI) generation algorithm for localizing vertebrae-parallel cylinder in cancellous bone. Due to no phantom for BMD calibration, we calculate ROI BMD by CT-BMD of personal muscle and fat. Results & Discussion: Clinical orthopaedic experts are invited to design 5 osteoporosis questions in Chinese, evaluating performance of vanilla LLaMA and our fine-tuned model. Our model outperforms LLaMA on over 80% of these questions, understanding ‘Expert Consensus on Osteoporosis’, ‘QCT for osteoporosis diagnosis’ and ‘Effect of age on osteoporosis’. Detailed results are shown in appendix. Future work may be done by training a larger LLM on the whole orthopaedics with more high-quality domain data, or a multi-modal GPT combining and understanding X-ray and medical text for orthopaedic computer-aided-diagnosis. However, GPT model gives unexpected outputs sometimes, such as repetitive text or seemingly normal but wrong answer (called ‘hallucination’). Even though GPT give correct answers, it cannot be considered as valid clinical diagnoses instead of clinical doctors. The PL-QCT BMD system provided by Bone’s QCT(Bone’s Technology(Shenzhen) Limited) achieves 0.1448mg/cm2(spine) and 0.0002 mg/cm2(hip) mean absolute error(MAE) and linear correlation coefficient R2=0.9970(spine) and R2=0.9991(hip)(compared to QCT-Pro(Mindways)) on 155 patients in three-center clinical trial in Guangzhou, China. Conclusion: This study builds a Chinese osteoporosis corpus and develops a fine-tuned and domain-adapted LLM as well as a PL-QCT BMD measurement system. Our fine-tuned GPT model shows better capability than LLaMA model on most testing questions on osteoporosis. Combined with our PL-QCT BMD system, we are looking forward to providing science popularization and early morning screening for potential osteoporotic patients.

Keywords: GPT, phantom-less QCT, large language model, osteoporosis

Procedia PDF Downloads 43
962 Developing an Exhaustive and Objective Definition of Social Enterprise through Computer Aided Text Analysis

Authors: Deepika Verma, Runa Sarkar

Abstract:

One of the prominent debates in the social entrepreneurship literature has been to establish whether entrepreneurial work for social well-being by for-profit organizations can be classified as social entrepreneurship or not. Of late, the scholarship has reached a consensus. It concludes that there seems little sense in confining social entrepreneurship to just non-profit organizations. Boosted by this research, increasingly a lot of businesses engaged in filling the social infrastructure gaps in developing countries are calling themselves social enterprise. These organizations are diverse in their ownership, size, objectives, operations and business models. The lack of a comprehensive definition of social enterprise leads to three issues. Firstly, researchers may face difficulty in creating a database for social enterprises because the choice of an entity as a social enterprise becomes subjective or based on some pre-defined parameters by the researcher which is not replicable. Secondly, practitioners who use ‘social enterprise’ in their vision/mission statement(s) may find it difficult to adjust their business models accordingly especially during the times when they face the dilemma of choosing social well-being over business viability. Thirdly, social enterprise and social entrepreneurship attract a lot of donor funding and venture capital. In the paucity of a comprehensive definitional guide, the donors or investors may find assigning grants and investments difficult. It becomes necessary to develop an exhaustive and objective definition of social enterprise and examine whether the understanding of the academicians and practitioners about social enterprise match. This paper develops a dictionary of words often associated with social enterprise or (and) social entrepreneurship. It further compares two lexicographic definitions of social enterprise imputed from the abstracts of academic journal papers and trade publications extracted from the EBSCO database using the ‘tm’ package in R software.

Keywords: EBSCO database, lexicographic definition, social enterprise, text mining

Procedia PDF Downloads 363
961 A Self Beheld the Eyes of the Other: Reflections on Montesquieu's Persian Letters

Authors: Seyed Majid Alavi Shooshtari

Abstract:

As a multi-layered prose piece of artistry and craftsmanship Charles de Secondat, baron de Montesquieu’s Persian Letters (1721) is a satirical work which records the experiences of two Persian noblemen, Usbek and Rica, traveling through France in the early eighteenth century. Montesquieu creates Persian Letters as a critique of the French society, a critical explanation of what was considered to be 'the Orient' in the period, and an invaluable historical document which illustrates the ways Europe and the East understood each other in the first half of the eighteenth century. However, Persian Letters is considered today, in part, an Orientalist text because of it presenting the culture of the East using stereotypical images. Although, when Montesquieu published Persian Letters, the term Orientalist was a harmless word for people who studied or took an interest in it, the ways in which this Western intellectual author exerts his critique of French social and political life through the eyes of Persian protagonists by placing the example of the Orient (the Other) at the service of an ongoing Eighteen century discourse does raise some Eastern eyebrows. The fact that Persian side of the novel is considered by some critics as a fanciful decor, and the letters sent home are seen as literary props, yet these Eastern men intelligently question the rationality of religious, state, military and cultural practices and uncover much of the absurdity, irrationality or frivolity of European life. By drawing on the insight that Montesquieu’s text problematizes the assumption that orientalism monolithically constructs the Orient as the Other, the present paper aims to examine how the innocent gaze of two Eastern travelers mirrors the ways Europe’s identity defines its-Self.

Keywords: montesquieu, persian letters, ‘the orint’, identity politics, self, the other

Procedia PDF Downloads 87