Search results for: text retrieval
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1550

Search results for: text retrieval

1340 Selecting Answers for Questions with Multiple Answer Choices in Arabic Question Answering Based on Textual Entailment Recognition

Authors: Anes Enakoa, Yawei Liang

Abstract:

Question Answering (QA) system is one of the most important and demanding tasks in the field of Natural Language Processing (NLP). In QA systems, the answer generation task generates a list of candidate answers to the user's question, in which only one answer is correct. Answer selection is one of the main components of the QA, which is concerned with selecting the best answer choice from the candidate answers suggested by the system. However, the selection process can be very challenging especially in Arabic due to its particularities. To address this challenge, an approach is proposed to answer questions with multiple answer choices for Arabic QA systems based on Textual Entailment (TE) recognition. The developed approach employs a Support Vector Machine that considers lexical, semantic and syntactic features in order to recognize the entailment between the generated hypotheses (H) and the text (T). A set of experiments has been conducted for performance evaluation and the overall performance of the proposed method reached an accuracy of 67.5% with C@1 score of 80.46%. The obtained results are promising and demonstrate that the proposed method is effective for TE recognition task.

Keywords: information retrieval, machine learning, natural language processing, question answering, textual entailment

Procedia PDF Downloads 128
1339 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa

Abstract:

The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 168
1338 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 372
1337 Google Translate: AI Application

Authors: Shaima Almalhan, Lubna Shukri, Miriam Talal, Safaa Teskieh

Abstract:

Since artificial intelligence is a rapidly evolving topic that has had a significant impact on technical growth and innovation, this paper examines people's awareness, use, and engagement with the Google Translate application. To see how familiar aware users are with the app and its features, quantitative and qualitative research was conducted. The findings revealed that consumers have a high level of confidence in the application and how far people they benefit from this sort of innovation and how convenient it makes communication.

Keywords: artificial intelligence, google translate, speech recognition, language translation, camera translation, speech to text, text to speech

Procedia PDF Downloads 134
1336 A Rational Intelligent Agent to Promote Metacognition a Situation of Text Comprehension

Authors: Anass Hsissi, Hakim Allali, Abdelmajid Hajami

Abstract:

This article presents the results of a doctoral research which aims to integrate metacognitive dimension in the design of human learning computing environments (ILE). We conducted a detailed study on the relationship between metacognitive processes and learning, specifically their positive impact on the performance of learners in the area of reading comprehension. Our contribution is to implement methods, using an intelligent agent based on BDI paradigm to ensure intelligent and reliable support for low readers, in order to encourage regulation and a conscious and rational use of their metacognitive abilities.

Keywords: metacognition, text comprehension EIAH, autoregulation, BDI agent

Procedia PDF Downloads 302
1335 Developmental Trends on Initial Letter Fluency in Typically Developing Children

Authors: Sunila John, B. Rajashekhar

Abstract:

Initial letter fluency tasks are one of the simple behavioral measures to evaluate the complex nature of word retrieval ability. This task requires the participant to retrieve as many words as possible beginning with a particular letter in a fixed time frame. Though the task of verbal fluency is popular among adult clinical conditions, its role in children has been less emphasized. There exists a lack of in-depth understanding of processes underlying verbal fluency performance in typically developing children. The present study, therefore, aims to delineate the developmental trend on initial letter fluency task observed in typically developing Malayalam speaking children. The participants were aged between 5 to 10 years and categorized into three groups: Group I (class I and II, mean (SD) age years: 6.44(.78)), Group II (class III and IV, mean (SD) age years: 8.59 (.83)) and group III (class V and VI, mean (SD) age years: 10.28 (.80). On two tasks of initial letter fluency, the verbal fluency outcome measures were analyzed. The study findings revealed a distinct pattern of initial letter fluency development which may enhance its usefulness in clinical and research settings.

Keywords: children, development, initial letter fluency, word retrieval

Procedia PDF Downloads 438
1334 Research on the Landscape of Xi'an Ancient City Based on the Poetry Text of Tang Dynasty

Authors: Zou Yihui

Abstract:

The integration of the traditional landscape of the ancient city and the poet's emotions and symbolization into ancient poetry is the unique cultural gene and spiritual core of the historical city, and re-understanding the historical landscape pattern from the poetry is conducive to continuing the historical city context and improving the current situation of the gradual decline of the poetry of the modern historical urban landscape. Starting from Tang poetry uses semantic analysis methods、combined with text mining technology, entry mining, word frequency analysis, and cluster analysis of the landscape information of Tang Chang'an City were carried out, and the method framework for analyzing the urban landscape form based on poetry text was constructed. Nearly 160 poems describing the landscape of Tang Chang'an City were screened, and the poetic landscape characteristics of Tang Chang'an City were sorted out locally in order to combine with modern urban spatial development to continue the urban spatial context.

Keywords: Tang Chang'an City, poetic texts, semantic analysis, historical landscape

Procedia PDF Downloads 23
1333 Increasing the Ability of State Senior High School 12 Pekanbaru Students in Writing an Analytical Exposition Text through Comic Strips

Authors: Budiman Budiman

Abstract:

This research aimed at describing and testing whether the students’ ability in writing analytical exposition text is increased by using comic strips at SMAN 12 Pekanbaru. The respondents of this study were the second-grade students, especially XI Science 3 academic year 2011-2012. The total number of students in this class was forty-two (42) students. The quantitative and qualitative data was collected by using writing test and observation sheets. The research finding reveals that there is a significant increase of students’ writing ability in writing analytical exposition text through comic strips. It can be proved by the average score of pre-test was 43.7 and the average score of post-test was 65.37. Besides, the students’ interest and motivation in learning are also improved. These can be seen from the increasing of students’ awareness and activeness in learning process based on observation sheets. The findings draw attention to the use of comic strips in teaching and learning is beneficial for better learning outcome.

Keywords: analytical exposition, comic strips, secondary school students, writing ability

Procedia PDF Downloads 136
1332 Improving Technical Translation Ability of the Iranian Students of Translation Through Multimedia: An Empirical Study

Authors: Dina Zakeri, Ali Aminzad

Abstract:

Multimedia-assisted teaching results in eliminating traditional training barriers, facilitating the cognition process and upgrading learning outcomes. This study attempted to examine the effects of implementing multimedia on teaching technical translation model and on the technical text translation ability of Iranian students of translation. To fulfill the purpose of the study, a total of forty-six learners were selected out of fifty-seven participants in a higher education center in Tehran based on their scores in Preliminary English Test (PET) and were divided randomly into the experimental and control groups. Prior to the treatment, a technical text translation questionnaire was devised and then approved and validated by three assistant professors of technical fields and three assistant professors of Teaching English as a Foreign Language (TEFL) at the university. This questionnaire was administered as a pretest to both groups. Control and experimental groups were trained for five successive weeks using identical course books but with a different lesson plan that allowed employing multimedia for the experimental group only. The devised and approved questionnaire was administered as a posttest to both groups at the end of the instruction. A multivariate ANOVA was run to compare the two groups’ means on the PET, pretest and posttest. The results showed the rejection of all null hypotheses of the study and revealed that multimedia significantly improved technical text translation ability of the learners.

Keywords: multimedia, multimedia-mediated teaching, technical translation model, technical text, translation ability

Procedia PDF Downloads 110
1331 Temporality, Place and Autobiography in J.M. Coetzee’s 'Summertime'

Authors: Barbara Janari

Abstract:

In this paper it is argued that the effect of the disjunctive temporality in Summertime (the third of J.M. Coetzee’s fictionalised memoirs) is two-fold: firstly, it reflects the memoir’s ambivalent, contradictory representations of place in order to emphasize the fractured sense of self growing up in South Africa during apartheid entailed for Coetzee. Secondly, it reconceives the autobiographical discourse as one that foregrounds the inherent fictionality of all texts. The memoir’s narrative is filtered through intricate textual strategies that disrupt the chronological movement of the narrative, evoking the labyrinthine ways in which the past and present intersect and interpenetrate each other. It is framed by entries from Coetzee’s Notebooks: it opens with entries that cover the years 1972–1975, and ends with a number of undated fragments from his Notebooks. Most of the entries include a short ‘memo’ at the end, added between 1999 and 2000. While the memos follow the Notebook entries in the text, they are separated by decades. Between the Notebook entries is a series of interviews conducted by Vincent, the text’s putative biographer, between 2007 and 2008, based on recollections from five people who had known Coetzee in the 1970s – a key period in John’s life as it marks both his return to South Africa after a failed emigration attempt to America, and the beginning of his writing career, with the publication of Dusklands in 1974. The relationship between the memoir’s various parts is a key feature of Coetzee’s representation of place in Summertime, which is constructed as a composite one in which the principle of reflexive referencing has to be adopted. In other words, readers have to suspend individual references temporarily until the relationships between the parts have been connected to each other. In order to apprehend meaning in the text, the disparate narrative elements have to first be tied together. In this text, then, the experience of time as ordered and chronological is ruptured. Instead, the memoir’s themes and patterns become apparent most clearly through reflexive referencing, by which relationships between disparate sections of the text are linked. The image of the fictional John that emerges from the text is a composite of this John and the author, J.M. Coetzee, and is one which embodies Coetzee’s often fraught relationship with his home country, South Africa.

Keywords: autobiography, place, reflexive referencing, temporality

Procedia PDF Downloads 50
1330 Investigating the Encouraging Factors for Scholarly Works Contribution towards Institutional Repository: A Case Study at a Malaysian University

Authors: Mohd Rashid bin Ab Hamid, Noor Azura binti Omar, Zainol Bin Mustafa

Abstract:

Purpose: The aim of this paper is to study the encouraging factors for scholarly works contribution towards among academicians at Malaysian university. Methods: This paper uses questionnaire for data collection on the respondents’ perceptional level on the institutional repository efforts in one of the university under study. Several encouraging factors have been identified and to be measured using descriptive statistics. The factors are related to content contribution, i.e. personal factor, professional factor, organizational factor and technological factor. Findings: The study found that all these four encouraging factors did have a relation to the contribution of scholarly works in the university by the academician. Research Limitations: This study used a case study and generalization to all Malaysian universities should be well taken care of. Practical implications: The library at the university should look into these four encouraging factors in order to enhance the contribution from academician towards the repository. Originality/value: This research paper provides basic information for the knowledge management officers in the university by endeavouring more efforts in order to attract more contributions.

Keywords: institutional repository, information retrieval, information storage and retrieval

Procedia PDF Downloads 543
1329 Effect of Mobile Phone Text Message Reminders on Adherence to Routine Prenatal Iron/Folic Acid Supplement among Pregnant Women: A Pilot Study

Authors: Nneka U. Igboeli, Maxwell O. Adibe

Abstract:

Iron and folate supplementation in pregnancy are important interventions that prevent maternal anaemia and fetal anomaly. Thus, daily oral doses of iron and folic acid are recommended throughout pregnancy as part of antenatal care. However, low adherence has been a major drawback leading to low effectiveness of these programs. The effect of mobile text message reminders to pregnant women to take their routine medications on adherence was evaluated in this study. The first 100 women who consented to the study were recruited and randomized to either receive a text message reminder on adherence to routine medications or not. Adherence was assessed using the 8-item Modified Morisky Adherence Scale (8-MMAS). The folders of successfully recruited women were tagged with the a study number assigned to each of them. The womens’ phone numbers were collected and these were used to send text messages reminders on adhering to routine drugs only to women in the intervention group. The text messages were sent three times per week for a period of four weeks with an adherence reassessment at the one month follow-up antenatal visit for recruited women. At one month follow-up, the lost to follow-up were 6 (16%) women for the intervention group and 17 (34%) for the control group. The across group mean difference in adherence score was 0.07 (-0.96 – 1.10) at baseline and 0.3 (-0.31 – 0.92) after intervention, both insignificant at p > 0.05. The within group change were increases of 0.58 (0.00 – 1.16) (p = 0.05) from baseline for the intervention group and a 0.35 (-0.51 – 1.20) (p = 0.395) for the control group. Non-significant increase in adherence scores were recorded for both groups. However, the increase in adherence scores of women in the intervention group was greater and may be potentially transformed into more positive results if the study period is increased with possibly reduced study drop-outs shows great promise for more positive results.

Keywords: adherence, mobile phone, pregnant women, reminders

Procedia PDF Downloads 157
1328 Content-Based Mammograms Retrieval Based on Breast Density Criteria Using Bidimensional Empirical Mode Decomposition

Authors: Sourour Khouaja, Hejer Jlassi, Nadia Feddaoui, Kamel Hamrouni

Abstract:

Most medical images, and especially mammographies, are now stored in large databases. Retrieving a desired image is considered of great importance in order to find previous similar cases diagnosis. Our method is implemented to assist radiologists in retrieving mammographic images containing breast with similar density aspect as seen on the mammogram. This is becoming a challenge seeing the importance of density criteria in cancer provision and its effect on segmentation issues. We used the BEMD (Bidimensional Empirical Mode Decomposition) to characterize the content of images and Euclidean distance measure similarity between images. Through the experiments on the MIAS mammography image database, we confirm that the results are promising. The performance was evaluated using precision and recall curves comparing query and retrieved images. Computing recall-precision proved the effectiveness of applying the CBIR in the large mammographic image databases. We found a precision of 91.2% for mammography with a recall of 86.8%.

Keywords: BEMD, breast density, contend-based, image retrieval, mammography

Procedia PDF Downloads 213
1327 Multimodal Sentiment Analysis With Web Based Application

Authors: Shreyansh Singh, Afroz Ahmed

Abstract:

Sentiment Analysis intends to naturally reveal the hidden mentality that we hold towards an entity. The total of this assumption over a populace addresses sentiment surveying and has various applications. Current text-based sentiment analysis depends on the development of word embeddings and Machine Learning models that take in conclusion from enormous text corpora. Sentiment Analysis from text is presently generally utilized for consumer loyalty appraisal and brand insight investigation. With the expansion of online media, multimodal assessment investigation is set to carry new freedoms with the appearance of integral information streams for improving and going past text-based feeling examination using the new transforms methods. Since supposition can be distinguished through compelling follows it leaves, like facial and vocal presentations, multimodal opinion investigation offers good roads for examining facial and vocal articulations notwithstanding the record or printed content. These methodologies use the Recurrent Neural Networks (RNNs) with the LSTM modes to increase their performance. In this study, we characterize feeling and the issue of multimodal assessment investigation and audit ongoing advancements in multimodal notion examination in various spaces, including spoken surveys, pictures, video websites, human-machine, and human-human connections. Difficulties and chances of this arising field are additionally examined, promoting our theory that multimodal feeling investigation holds critical undiscovered potential.

Keywords: sentiment analysis, RNN, LSTM, word embeddings

Procedia PDF Downloads 97
1326 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 249
1325 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 248
1324 Translation and Ideology: New Perspectives

Authors: Hamza Salih

Abstract:

Since translation is no longer viewed as a mere replacement of linguistic codes from one language to another, it has increasingly been considered, especially with the advent of the cultural turn in the late 70's, in relation to the broader external context in which it takes place. According to scholars in the field, the translation process is determined by the political, economic and cultural values which exert external pressures on the translator. Correspondingly, the relationship between translation as an act of re-writing the original text and ideology has already been established. This paper addresses the issue of how ideology comes into play in the translational process and what strategies the translator adopts to foreground or circumvent ideological constraints. Along with this, the paper will touch upon the notions of censorship, manipulation, subversion and domestication which are deemed of relevance to this very topic. In fact, after the domination of the empirically-oriented linguistic approaches in translation studies, the relationship between translation and ideology has to be foregrounded to draw attention to the fact that the translation process is not a mere text-to-text linguistic transfer, but, on the contrary, takes place in the midst of economic, political, cultural and religious variables, which some scholars subsume under the category ideology.

Keywords: translation, language, ideology, subversion, censorship and manipulation

Procedia PDF Downloads 227
1323 How Is a Machine-Translated Literary Text Organized in Coherence? An Analysis Based upon Theme-Rheme Structure

Authors: Jiang Niu, Yue Jiang

Abstract:

With the ultimate goal to automatically generate translated texts with high quality, machine translation has made tremendous improvements. However, its translations of literary works are still plagued with problems in coherence, esp. the translation between distant language pairs. One of the causes of the problems is probably the lack of linguistic knowledge to be incorporated into the training of machine translation systems. In order to enable readers to better understand the problems of machine translation in coherence, to seek out the potential knowledge to be incorporated, and thus to improve the quality of machine translation products, this study applies Theme-Rheme structure to examine how a machine-translated literary text is organized and developed in terms of coherence. Theme-Rheme structure in Systemic Functional Linguistics is a useful tool for analysis of textual coherence. Theme is the departure point of a clause and Rheme is the rest of the clause. In a text, as Themes and Rhemes may be connected with each other in meaning, they form thematic and rhematic progressions throughout the text. Based on this structure, we can look into how a text is organized and developed in terms of coherence. Methodologically, we chose Chinese and English as the language pair to be studied. Specifically, we built a comparable corpus with two modes of English translations, viz. machine translation (MT) and human translation (HT) of one Chinese literary source text. The translated texts were annotated with Themes, Rhemes and their progressions throughout the texts. The annotated texts were analyzed from two respects, the different types of Themes functioning differently in achieving coherence, and the different types of thematic and rhematic progressions functioning differently in constructing texts. By analyzing and contrasting the two modes of translations, it is found that compared with the HT, 1) the MT features “pseudo-coherence”, with lots of ill-connected fragments of information using “and”; 2) the MT system produces a static and less interconnected text that reads like a list; these two points, in turn, lead to the less coherent organization and development of the MT than that of the HT; 3) novel to traditional and previous studies, Rhemes do contribute to textual connection and coherence though less than Themes do and thus are worthy of notice in further studies. Hence, the findings suggest that Theme-Rheme structure be applied to measuring and assessing the coherence of machine translation, to being incorporated into the training of the machine translation system, and Rheme be taken into account when studying the textual coherence of both MT and HT.

Keywords: coherence, corpus-based, literary translation, machine translation, Theme-Rheme structure

Procedia PDF Downloads 182
1322 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 137
1321 The Use of Punctuation by Primary School Students Writing Texts Collaboratively: A Franco-Brazilian Comparative Study

Authors: Cristina Felipeto, Catherine Bore, Eduardo Calil

Abstract:

This work aims to analyze and compare the punctuation marks (PM) in school texts of Brazilian and French students and the comments on these PM made spontaneously by the students during the ongoing text. Assuming textual genetics as an investigative field within a dialogical and enunciative approach, we defined a common methodological design in two 1st year classrooms (7 years old) of the primary school, one classroom in Brazil (Maceio) and the other one in France (Paris). Through a multimodal capture system of writing processes in real time and space (Ramos System), we recorded the collaborative writing proposal in dyads in each of the classrooms. This system preserves the classroom’s ecological characteristics and provides a video recording synchronized with dialogues, gestures and facial expressions of the students, the stroke of the pen’s ink on the sheet of paper and the movement of the teacher and students in the classroom. The multimodal register of the writing process allowed access to the text in progress and the comments made by the students on what was being written. In each proposed text production, teachers organized their students in dyads and requested that they should talk, combine and write a fictional narrative. We selected a Dyad of Brazilian students (BD) and another Dyad of French students (FD) and we have filmed 6 proposals for each of the dyads. The proposals were collected during the 2nd Term of 2013 (Brazil) and 2014 (France). In 6 texts written by the BD there were identified 39 PMs and 825 written words (on average, a PM every 23 words): Of these 39 PMs, 27 were highlighted orally and commented by either student. In the texts written by the FD there were identified 48 PMs and 258 written words (on average, 1 PM every 5 words): Of these 48 PM, 39 were commented by the French students. Unlike what the studies on punctuation acquisition point out, the PM that occurred the most were hyphens (BD) and commas (FD). Despite the significant difference between the types and quantities of PM in the written texts, the recognition of the need for writing PM in the text in progress and the comments have some common characteristics: i) the writing of the PM was not anticipated in relation to the text in progress, then they were added after the end of a sentence or after the finished text itself; ii) the need to add punctuation marks in the text came after one of the students had ‘remembered’ that a particular sign was needed; iii) most of the PM inscribed were not related to their linguistic functions, but the graphic-visual feature of the text; iv) the comments justify or explain the PM, indicating metalinguistic reflections made by the students. Our results indicate how the comments of the BD and FD express the dialogic and subjective nature of knowledge acquisition. Our study suggests that the initial learning of PM depends more on its graphic features and interactional conditions than on its linguistic functions.

Keywords: collaborative writing, erasure, graphic marks, learning, metalinguistic awareness, textual genesis

Procedia PDF Downloads 142
1320 Business Domain Modelling Using an Integrated Framework

Authors: Mohammed Hasan Salahat, Stave Wade

Abstract:

This paper presents an application of a “Systematic Soft Domain Driven Design Framework” as a soft systems approach to domain-driven design of information systems development. The framework combining techniques from Soft Systems Methodology (SSM), the Unified Modeling Language (UML), and an implementation pattern knows as ‘Naked Objects’. This framework have been used in action research projects that have involved the investigation and modeling of business processes using object-oriented domain models and the implementation of software systems based on those domain models. Within this framework, Soft Systems Methodology (SSM) is used as a guiding methodology to explore the problem situation and to develop the domain model using UML for the given business domain. The framework is proposed and evaluated in our previous works, and a real case study ‘Information Retrieval System for Academic Research’ is used, in this paper, to show further practice and evaluation of the framework in different business domain. We argue that there are advantages from combining and using techniques from different methodologies in this way for business domain modeling. The framework is overviewed and justified as multi-methodology using Mingers Multi-Methodology ideas.

Keywords: SSM, UML, domain-driven design, soft domain-driven design, naked objects, soft language, information retrieval, multimethodology

Procedia PDF Downloads 537
1319 Extraction of Compound Words in Malay Sentences Using Linguistic and Statistical Approaches

Authors: Zamri Abu Bakar Zamri, Normaly Kamal Ismail Normaly, Mohd Izani Mohamed Rawi Izani

Abstract:

Malay noun compound are phrases that consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Malay noun compounds using linguistic and statistical approaches. Most of the existing methods have concentrated on the extraction of bi-gram noun+noun compound. However, extracting noun+verb, noun+adjective and noun+prepositional is challenging due to the difficulty of selecting an appropriate method with effective results. Thus, there is still room for improvement in terms of enhancing the effectiveness of compound word extraction. Therefore, this study proposed a combination of linguistic approach and statistical measures in order to enhance the extraction of compound words. Several preprocessing steps are involved including normalization, tokenization, and stemming. The linguistic approach that has been used in this study is Part-of-Speech (POS) tagging. In addition, a new linguistic pattern for named entities has been utilized using a list of Malays named entities in order to enhance the linguistic approach in terms of noun compound recognition. The proposed statistical measures consists of NC-value, NTC-value and NLC value.

Keywords: Compound Word, Noun Compound, Linguistic Approach, Statistical Approach

Procedia PDF Downloads 325
1318 Architectural Experience of the Everyday in Bangkok CBD

Authors: Thirayu Jumsai Na Ayudhya

Abstract:

The attempt to understand about what architecture means to people as they go about their everyday life revealed that knowledge such as environmental psychology, environmental perception, environmental aesthetics, inadequately address the contextualized and holistic theoretical framework. In my previous research, it was found that people’s making senses of their everyday architecture can be addressed in terms of four super‐ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). In this research, Bangkok CBD was selected as the focal urban context that the integrated style of architecture is noticeable. It is expected that in a unique urban context like Bangkok CBD unprecedented super-ordinate themes will be unveiled through the reflection of people’s everyday experiences. In this research, people’s architectural experience conducted in Bangkok CBD, Thailand, will be presented succinctly. The research addresses the question of how do people make sense of their everyday architecture/buildings especially in a unique urban context, Bangkok CBD, and identifies ways in which people make sense of their everyday architecture. Two key methodologies are adopted. First, Participant-Produced-Photograph (PPP) allows people to express their experiences of the everyday urban context freely without any interference or forced-data generating by researchers. Second, Interpretative Phenomenological Analysis (IPA) are also applied as main methodologies. With IPA methodology, a small pool of participants is considered giving the detailed level of analysis and its potential to produce a meaningful outcome.

Keywords: architectural experience, building appreciation, design psychology, environmental psychology, sense-making, the everyday experience, transactional theory

Procedia PDF Downloads 309
1317 Intertextuality as a Dialogue Between Postmodern Writer J. Fowles and Mid-English Writer J. Donne

Authors: Isahakyan Heghine

Abstract:

Intertextuality, being in the centre of attention of both linguists and literary critics, is vividly expressed in the outstanding British novelist and philosopher J. Fowles' works. 'The Magus’ is a deep psychological and philosophical novel with vivid intertextual links with the Greek mythology and authors from different epochs. The aim of the paper is to show how intertextuality might serve as a dialogue between two authors (J. Fowles and J. Donne) disguised in the dialogue of two protagonists of the novel : Conchis and Nicholas. Contrastive viewpoints concerning man's isolation, loneliness are stated in the dialogue. Due to the conceptual analysis of the text it becomes possible both to decode the conceptual information of the text and find out its intertextual links.

Keywords: dialogue, conceptual analysis, isolation, intertextuality

Procedia PDF Downloads 308
1316 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 105
1315 A Multivariate Exploratory Data Analysis of a Crisis Text Messaging Service in Order to Analyse the Impact of the COVID-19 Pandemic on Mental Health in Ireland

Authors: Hamda Ajmal, Karen Young, Ruth Melia, John Bogue, Mary O'Sullivan, Jim Duggan, Hannah Wood

Abstract:

The Covid-19 pandemic led to a range of public health mitigation strategies in order to suppress the SARS-CoV-2 virus. The drastic changes in everyday life due to lockdowns had the potential for a significant negative impact on public mental health, and a key public health goal is to now assess the evidence from available Irish datasets to provide useful insights on this issue. Text-50808 is an online text-based mental health support service, established in Ireland in 2020, and can provide a measure of revealed distress and mental health concerns across the population. The aim of this study is to explore statistical associations between public mental health in Ireland and the Covid-19 pandemic. Uniquely, this study combines two measures of emotional wellbeing in Ireland: (1) weekly text volume at Text-50808, and (2) emotional wellbeing indicators reported by respondents of the Amárach public opinion survey, carried out on behalf of the Department of Health, Ireland. For this analysis, a multivariate graphical exploratory data analysis (EDA) was performed on the Text-50808 dataset dated from 15th June 2020 to 30th June 2021. This was followed by time-series analysis of key mental health indicators including: (1) the percentage of daily/weekly texts at Text-50808 that mention Covid-19 related issues; (2) the weekly percentage of people experiencing anxiety, boredom, enjoyment, happiness, worry, fear and stress in Amárach survey; and Covid-19 related factors: (3) daily new Covid-19 case numbers; (4) daily stringency index capturing the effect of government non-pharmaceutical interventions (NPIs) in Ireland. The cross-correlation function was applied to measure the relationship between the different time series. EDA of the Text-50808 dataset reveals significant peaks in the volume of texts on days prior to level 3 lockdown and level 5 lockdown in October 2020, and full level 5 lockdown in December 2020. A significantly high positive correlation was observed between the percentage of texts at Text-50808 that reported Covid-19 related issues and the percentage of respondents experiencing anxiety, worry and boredom (at a lag of 1 week) in Amárach survey data. There is a significant negative correlation between percentage of texts with Covid-19 related issues and percentage of respondents experiencing happiness in Amárach survey. Daily percentage of texts at Text-50808 that reported Covid-19 related issues to have a weak positive correlation with daily new Covid-19 cases in Ireland at a lag of 10 days and with daily stringency index of NPIs in Ireland at a lag of 2 days. The sudden peaks in text volume at Text-50808 immediately prior to new restrictions in Ireland indicate an association between a rise in mental health concerns following the announcement of new restrictions. There is also a high correlation between emotional wellbeing variables in the Amárach dataset and the number of weekly texts at Text-50808, and this confirms that Text-50808 reflects overall public sentiment. This analysis confirms the benefits of the texting service as a community surveillance tool for mental health in the population. This initial EDA will be extended to use multivariate modeling to predict the effect of additional Covid-19 related factors on public mental health in Ireland.

Keywords: COVID-19 pandemic, data analysis, digital health, mental health, public health, digital health

Procedia PDF Downloads 119
1314 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 130
1313 Social Media Mining with R. Twitter Analyses

Authors: Diana Codat

Abstract:

Tweets' analysis is part of text mining. Each document is a written text. It's possible to apply the usual text search techniques, in particular by switching to the bag-of-words representation. But the tweets induce peculiarities. Some may enrich the analysis. Thus, their length is calibrated (at least as far as public messages are concerned), special characters make it possible to identify authors (@) and themes (#), the tweet and retweet mechanisms make it possible to follow the diffusion of the information. Conversely, other characteristics may disrupt the analyzes. Because space is limited, authors often use abbreviations, emoticons to express feelings, and they do not pay much attention to spelling. All this creates noise that can complicate the task. The tweets carry a lot of potentially interesting information. Their exploitation is one of the main axes of the analysis of the social networks. We show how to access Twitter-related messages. We will initiate a study of the properties of the tweets, and we will follow up on the exploitation of the content of the messages. We will work under R with the package 'twitteR'. The study of tweets is a strong focus of analysis of social networks because Twitter has become an important vector of communication. This example shows that it is easy to initiate an analysis from data extracted directly online. The data preparation phase is of great importance.

Keywords: data mining, language R, social networks, Twitter

Procedia PDF Downloads 158
1312 A New 3D Shape Descriptor Based on Multi-Resolution and Multi-Block CS-LBP

Authors: Nihad Karim Chowdhury, Mohammad Sanaullah Chowdhury, Muhammed Jamshed Alam Patwary, Rubel Biswas

Abstract:

In content-based 3D shape retrieval system, achieving high search performance has become an important research problem. A challenging aspect of this problem is to find an effective shape descriptor which can discriminate similar shapes adequately. To address this problem, we propose a new shape descriptor for 3D shape models by combining multi-resolution with multi-block center-symmetric local binary pattern operator. Given an arbitrary 3D shape, we first apply pose normalization, and generate a set of multi-viewed 2D rendered images. Second, we apply Gaussian multi-resolution filter to generate several levels of images from each of 2D rendered image. Then, overlapped sub-images are computed for each image level of a multi-resolution image. Our unique multi-block CS-LBP comes next. It allows the center to be composed of m-by-n rectangular pixels, instead of a single pixel. This process is repeated for all the 2D rendered images, derived from both ‘depth-buffer’ and ‘silhouette’ rendering. Finally, we concatenate all the features vectors into one dimensional histogram as our proposed 3D shape descriptor. Through several experiments, we demonstrate that our proposed 3D shape descriptor outperform the previous methods by using a benchmark dataset.

Keywords: 3D shape retrieval, 3D shape descriptor, CS-LBP, overlapped sub-images

Procedia PDF Downloads 425
1311 Study on the Focus of Attention of Special Education Students in Primary School

Authors: Tung-Kuang Wu, Hsing-Pei Hsieh, Ying-Ru Meng

Abstract:

Special Education in Taiwan has been facing difficulties including shortage of teachers and lack in resources. Some students need to receive special education are thus not identified or admitted. Fortunately, information technologies can be applied to relieve some of the difficulties. For example, on-line multimedia courseware can be used to assist the learning of special education students and take pretty much workload from special education teachers. However, there may exist cognitive variations between students in special or regular educations, which suggests the design of online courseware requires different considerations. This study aims to investigate the difference in focus of attention (FOA) between special and regular education students of primary school in viewing the computer screen. The study is essential as it helps courseware developers in determining where to put learning elements that matter the most on the right position of screen. It may also assist special education specialists to better understand the subtle differences among various subtypes of learning disabilities. This study involves 76 special education students (among them, 39 are students with mental retardation, MR, and 37 are students with learning disabilities, LDs) and 42 regular education students. The participants were asked to view a computer screen showing a picture partitioned into 3 × 3 areas with each area filled with text or icon. The subjects were then instructed to mark on the prior given paper sheets, which are also partitioned into 3 × 3 grids, the areas corresponding to the pictures on the computer screen that they first set their eyes on. The data are then collected and analyzed. Major findings are listed: 1. In both text and icon scenario, significant differences exist in the first preferred FOA between special and regular education students. The first FOA for the former is mainly on area 1 (upper left area, 53.8% / 51.3% for MR / LDs students in text scenario; and 53.8% / 56.8% for MR / LDs students in icons scenario), while the latter on area 5 (middle area, 50.0% and 57.1% in text and icons scenarios). 2. The second most preferred area in text scenario for students with MR and LDs are area 2 (upper-middle, 20.5%) and 5 (middle area, 24.3%). In icons scenario, the results are similar, but lesser in percentage. 3. Students with LDs that show similar preference (either in text or icons scenarios) in FOA to regular education students tend to be of some specific sub-type of learning disabilities. For instance, students with LDs that chose area 5 (middle area, either in text or icon scenario) as their FOA are mostly ones that have reading or writing disability. Also, three (out of 13) subjects in this category, after going through the rediagnosis process, were excluded from being learning disabilities. In summary, the findings suggest when designing multimedia courseware for students with MR and LDs, the essential learning elements should be placed on area 1, 2 and 5. In addition, FOV preference may also potentially be used as an indicator for diagnosing students with LDs.

Keywords: focus of attention, learning disabilities, mental retardation, on-line multimedia courseware, special education

Procedia PDF Downloads 151