Search results for: text structuring
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1433

Search results for: text structuring

1163 Adaptation of Hough Transform Algorithm for Text Document Skew Angle Detection

Authors: Kayode A. Olaniyi, Olabanji F. Omotoye, Adeola A. Ogunleye

Abstract:

The skew detection and correction form an important part of digital document analysis. This is because uncompensated skew can deteriorate document features and can complicate further document image processing steps. Efficient text document analysis and digitization can rarely be achieved when a document is skewed even at a small angle. Once the documents have been digitized through the scanning system and binarization also achieved, document skew correction is required before further image analysis. Research efforts have been put in this area with algorithms developed to eliminate document skew. Skew angle correction algorithms can be compared based on performance criteria. Most important performance criteria are accuracy of skew angle detection, range of skew angle for detection, speed of processing the image, computational complexity and consequently memory space used. The standard Hough Transform has successfully been implemented for text documentation skew angle estimation application. However, the standard Hough Transform algorithm level of accuracy depends largely on how much fine the step size for the angle used. This consequently consumes more time and memory space for increase accuracy and, especially where number of pixels is considerable large. Whenever the Hough transform is used, there is always a tradeoff between accuracy and speed. So a more efficient solution is needed that optimizes space as well as time. In this paper, an improved Hough transform (HT) technique that optimizes space as well as time to robustly detect document skew is presented. The modified algorithm of Hough Transform presents solution to the contradiction between the memory space, running time and accuracy. Our algorithm starts with the first step of angle estimation accurate up to zero decimal place using the standard Hough Transform algorithm achieving minimal running time and space but lacks relative accuracy. Then to increase accuracy, suppose estimated angle found using the basic Hough algorithm is x degree, we then run again basic algorithm from range between ±x degrees with accuracy of one decimal place. Same process is iterated till level of desired accuracy is achieved. The procedure of our skew estimation and correction algorithm of text images is implemented using MATLAB. The memory space estimation and process time are also tabulated with skew angle assumption of within 00 and 450. The simulation results which is demonstrated in Matlab show the high performance of our algorithms with less computational time and memory space used in detecting document skew for a variety of documents with different levels of complexity.

Keywords: hough-transform, skew-detection, skew-angle, skew-correction, text-document

Procedia PDF Downloads 128
1162 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 119
1161 Direct Blind Separation Methods for Convolutive Images Mixtures

Authors: Ahmed Hammed, Wady Naanaa

Abstract:

In this paper, we propose a general approach to deal with the problem of a convolutive mixture of images. We use a direct blind source separation method by adding only one non-statistical justified constraint describing the relationships between different mixing matrix at the aim to make its resolution easy. This method can be applied, provided that this constraint is known, to degraded document affected by the overlapping of text-patterns and images. This is due to chemical and physical reactions of the materials (paper, inks,...) occurring during the documents aging, and other unpredictable causes such as humidity, microorganism infestation, human handling, etc. We will demonstrate that this problem corresponds to a convolutive mixture of images. Subsequently, we will show how the validation of our method through numerical examples. We can so obtain clear images from unreadable ones which can be caused by pages superposition, a phenomenon similar to that we find every often in archival documents.

Keywords: blind source separation, convoluted mixture, degraded documents, text-patterns overlapping

Procedia PDF Downloads 299
1160 Scattered Places in Stories Singularity and Pattern in Geographic Information

Authors: I. Pina, M. Painho

Abstract:

Increased knowledge about the nature of place and the conditions under which space becomes place is a key factor for better urban planning and place-making. Although there is a broad consensus on the relevance of this knowledge, difficulties remain in relating the theoretical framework about place and urban management. Issues related to representation of places are among the greatest obstacles to overcome this gap. With this critical discussion, based on literature review, we intended to explore, in a common framework for geographical analysis, the potential of stories to spell out place meanings, bringing together qualitative text analysis and text mining in order to capture and represent the singularity contained in each person's life history, and the patterns of social processes that shape places. The development of this reasoning is based on the extensive geographical thought about place, and in the theoretical advances in the field of Geographic Information Science (GISc).

Keywords: discourse analysis, geographic information science place, place-making, stories

Procedia PDF Downloads 166
1159 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 345
1158 The Oral Production of University EFL Students: An Analysis of Tasks, Format, and Quality in Foreign Language Development

Authors: Vera Lucia Teixeira da Silva, Sandra Regina Buttros Gattolin de Paula

Abstract:

The present study focuses on academic literacy and addresses the impact of semantic-discursive resources on the constitution of genres that are produced in such context. The research considers the development of writing in the academic context in Portuguese. Researches that address academic literacy and the characteristics of the texts produced in this context are rare, mainly with focus on the development of writing, considering three variables: the constitution of the writer, the perception of the reader/interlocutor and the organization of the informational text flow. The research aims to map the semantic-discursive resources of the written register in texts of several genres and produced by students in the first semester of the undergraduate course in Letters. The hypothesis raised is that writing in the academic environment is not a recurrent literacy practice for these learners and can be explained by the ontogenetic and phylogenetic nature of language development. Qualitative in nature, the present research has as empirical data texts produced in a half-yearly course of Reading and Textual Production; these data result from the proposition of four different writing proposals, in a total of 600 texts. The corpus is analyzed based on semantic-discursive resources, seeking to contemplate relevant aspects of language (grammar, discourse and social context) that reveal the choices made in the reader/writer interrelationship and the organizational flow of the Text. Among the semantic-discursive resources, the analysis includes three resources, including (a) appraisal and negotiation to understand the attitudes negotiated (roles of the participants of the discourse and their relationship with the other); (b) ideation to explain the construction of the experience (activities performed and participants); and (c) periodicity to outline the flow of information in the organization of the text according to the genre it instantiates. The results indicate the organizational difficulties of the flow of the text information. Cartography contributes to the understanding of the way writers use language in an effort to present themselves, evaluate someone else’s work, and communicate with readers.

Keywords: academic writing, Portuguese mother tongue, semantic-discursive resources, academic context

Procedia PDF Downloads 92
1157 Topic-to-Essay Generation with Event Element Constraints

Authors: Yufen Qin

Abstract:

Topic-to-Essay generation is a challenging task in Natural language processing, which aims to generate novel, diverse, and topic-related text based on user input. Previous research has overlooked the generation of articles under the constraints of event elements, resulting in issues such as incomplete event elements and logical inconsistencies in the generated results. To fill this gap, this paper proposes an event-constrained approach for a topic-to-essay generation that enforces the completeness of event elements during the generation process. Additionally, a language model is employed to verify the logical consistency of the generated results. Experimental results demonstrate that the proposed model achieves a better BLEU-2 score and performs better than the baseline in terms of subjective evaluation on a real dataset, indicating its capability to generate higher-quality topic-related text.

Keywords: event element, language model, natural language processing, topic-to-essay generation.

Procedia PDF Downloads 197
1156 Examining the Dubbing Strategies Used in the Egyptian Dubbed Version of Mulan (1998)

Authors: Shaza Melies, Saadeya Salem, Seham Kareh

Abstract:

Cartoon films are multisemiotic as various modes integrate in the production of meaning. This study aims to examine the cultural and linguistic specific references in the Egyptian dubbed cartoon film Mulan. The study examines the translation strategies implemented in the Egyptian dubbed version of Mulan to meet the cultural preferences of the audience. The study reached the following findings: Using the traditional translation strategies does not deliver the intended meaning of the source text and causes loss in the intended humor. As a result, the findings showed that in the dubbed version, translators tend to omit, change, or add information to the target text to be accepted by the audience. The contrastive analysis of the Mulan (English and dubbed versions) proves the connotations that the dubbing has taken to be accepted by the target audience. Cartoon films are multisemiotic as various modes integrate in the production of meaning. This study aims to examine the cultural and linguistic specific references in the Egyptian dubbed cartoon film Mulan. The study examines the translation strategies implemented in the Egyptian dubbed version of Mulan to meet the cultural preferences of the audience. The study reached the following findings: Using the traditional translation strategies does not deliver the intended meaning of the source text and causes loss in the intended humor. As a result, the findings showed that in the dubbed version, translators tend to omit, change, or add information to the target text to be accepted by the audience. The contrastive analysis of the Mulan (English and dubbed versions) proves the connotations that the dubbing has taken to be accepted by the target audience.

Keywords: domestication, dubbing, Mulan, translation theories

Procedia PDF Downloads 98
1155 Translation Quality Assessment: Proposing a Linguistic-Based Model for Translation Criticism with Considering Ideology and Power Relations

Authors: Mehrnoosh Pirhayati

Abstract:

In this study, the researcher tried to propose a model of Translation Criticism (TC) regarding the phenomenon of Translation Quality Assessment (TQA). With changing the general view on re/writing as an illegal act, the researcher defined a scale for the act of translation and determined the redline of translation with other products. This research attempts to show TC as a related phenomenon to TQA. This study shows that TQA with using the rules and factors of TC as depicted in both product-oriented analysis and process-oriented analysis, determines the orientation or the level of the quality of translation. This study also depicts that TC, regarding TQA’s perspective, reveals the aim of the translation of original text and the root of ideological manipulation and re/writing. On the other hand, this study stresses the existence of a direct relationship between the linguistic materials and semiotic codes of a text or book. This study can be fruitful for translators, scholars, translation criticizers, and translation quality assessors, and also it is applicable in the area of pedagogy.

Keywords: a model of translation criticism, a model of translation quality assessment, critical discourse analysis (CDA), re/writing, translation criticism (TC), translation quality assessment (TQA)

Procedia PDF Downloads 271
1154 Intentionality and Context in the Paradox of Reward and Punishment in the Meccan Surahs

Authors: Asmaa Fathy Mohamed Desoky

Abstract:

The subject of this research is the inference of intentionality and context from the verses of the Meccan surahs, which include the paradox of reward and punishment, applied to the duality of disbelief and faith; The Holy Quran is the most important sacred linguistic reference in the Arabic language because it is rich in all the rules of the language in addition to the linguistic miracle. the Quranic text is a first-class intentional text, sent down to convey something to the recipient (Muhammad first and then communicates it to Muslims) and influence and convince him, which opens the door to many Ijtihad; a desire to reach the will of Allah and his intention from his words Almighty. Intentionality as a term is one of the most important deliberative terms, but it will be modified to suit the Quranic discourse, especially since intentionality is related to intention-as it turned out earlier - that is, it turns the reader or recipient into a predictor of the unseen, and this does not correspond to the Quranic discourse. Hence, in this research, a set of dualities will be identified that will be studied in order to clarify the meaning of them according to the opinions of previous interpreters in accordance with the sanctity of the Quranic discourse, which is intentionally related to the dualities of reward and punishment, such as: the duality of disbelief and faith, noting that it is a duality that combines opposites and Paradox on one level, because it may be an external paradox between action and reaction, and may be an internal paradox in matters related to faith, and may be a situational paradox in a specific event or a certain fact. It should be noted that the intention of the Qur'anic text is fully realized in form and content, in whole and in part, and this research includes a presentation of some applied models of the issues of intention and context that appear in the verses of the paradox of reward and punishment in the Meccan surahs in Quraan.

Keywords: intentionality, context, the paradox, reward, punishment, Meccan surahs

Procedia PDF Downloads 37
1153 Switching to the Latin Alphabet in Kazakhstan: A Brief Overview of Character Recognition Methods

Authors: Ainagul Yermekova, Liudmila Goncharenko, Ali Baghirzade, Sergey Sybachin

Abstract:

In this article, we address the problem of Kazakhstan's transition to the Latin alphabet. The transition process started in 2017 and is scheduled to be completed in 2025. In connection with these events, the problem of recognizing the characters of the new alphabet is raised. Well-known character recognition programs such as ABBYY FineReader, FormReader, MyScript Stylus did not recognize specific Kazakh letters that were used in Cyrillic. The author tries to give an assessment of the well-known method of character recognition that could be in demand as part of the country's transition to the Latin alphabet. Three methods of character recognition: template, structured, and feature-based, are considered through the algorithms of operation. At the end of the article, a general conclusion is made about the possibility of applying a certain method to a particular recognition process: for example, in the process of population census, recognition of typographic text in Latin, or recognition of photos of car numbers, store signs, etc.

Keywords: text detection, template method, recognition algorithm, structured method, feature method

Procedia PDF Downloads 156
1152 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 38
1151 A Method for Clinical Concept Extraction from Medical Text

Authors: Moshe Wasserblat, Jonathan Mamou, Oren Pereg

Abstract:

Natural Language Processing (NLP) has made a major leap in the last few years, in practical integration into medical solutions; for example, extracting clinical concepts from medical texts such as medical condition, medication, treatment, and symptoms. However, training and deploying those models in real environments still demands a large amount of annotated data and NLP/Machine Learning (ML) expertise, which makes this process costly and time-consuming. We present a practical and efficient method for clinical concept extraction that does not require costly labeled data nor ML expertise. The method includes three steps: Step 1- the user injects a large in-domain text corpus (e.g., PubMed). Then, the system builds a contextual model containing vector representations of concepts in the corpus, in an unsupervised manner (e.g., Phrase2Vec). Step 2- the user provides a seed set of terms representing a specific medical concept (e.g., for the concept of the symptoms, the user may provide: ‘dry mouth,’ ‘itchy skin,’ and ‘blurred vision’). Then, the system matches the seed set against the contextual model and extracts the most semantically similar terms (e.g., additional symptoms). The result is a complete set of terms related to the medical concept. Step 3 –in production, there is a need to extract medical concepts from the unseen medical text. The system extracts key-phrases from the new text, then matches them against the complete set of terms from step 2, and the most semantically similar will be annotated with the same medical concept category. As an example, the seed symptom concepts would result in the following annotation: “The patient complaints on fatigue [symptom], dry skin [symptom], and Weight loss [symptom], which can be an early sign for Diabetes.” Our evaluations show promising results for extracting concepts from medical corpora. The method allows medical analysts to easily and efficiently build taxonomies (in step 2) representing their domain-specific concepts, and automatically annotate a large number of texts (in step 3) for classification/summarization of medical reports.

Keywords: clinical concepts, concept expansion, medical records annotation, medical records summarization

Procedia PDF Downloads 107
1150 Use and Relationship of Shell Nouns as Cohesive Devices in the Quality of Second Language Writing

Authors: Kristine D. de Leon, Junifer A. Abatayo, Jose Cristina M. Pariña

Abstract:

The current study is a comparative analysis of the use of shell nouns as a cohesive device (CD) in an English for Second Language (ESL) setting in order to identify their use and relationship in the quality of second language (L2) writing. As these nouns were established to anticipate the meaning within, across or outside the text, their use has fascinated writing researchers. The corpus of the study included published articles from reputable journals and graduate students’ papers in order to analyze the frequency of shell nouns using “highly prevalent” nouns in the academic community, to identify the different lexicogrammatical patterns where these nouns occur and to the functions connected with these patterns. The result of the study implies that published authors used more shell nouns in their paper than graduate students. However, the functions of the different lexicogrammatical patterns for the frequently occurring shell nouns are somewhat similar. These results could help students in enhancing the cohesion of their text and in comprehending it.

Keywords: anaphoric, cataphoric, lexico-grammatical, shell nouns

Procedia PDF Downloads 161
1149 A U-Net Based Architecture for Fast and Accurate Diagram Extraction

Authors: Revoti Prasad Bora, Saurabh Yadav, Nikita Katyal

Abstract:

In the context of educational data mining, the use case of extracting information from images containing both text and diagrams is of high importance. Hence, document analysis requires the extraction of diagrams from such images and processes the text and diagrams separately. To the author’s best knowledge, none among plenty of approaches for extracting tables, figures, etc., suffice the need for real-time processing with high accuracy as needed in multiple applications. In the education domain, diagrams can be of varied characteristics viz. line-based i.e. geometric diagrams, chemical bonds, mathematical formulas, etc. There are two broad categories of approaches that try to solve similar problems viz. traditional computer vision based approaches and deep learning approaches. The traditional computer vision based approaches mainly leverage connected components and distance transform based processing and hence perform well in very limited scenarios. The existing deep learning approaches either leverage YOLO or faster-RCNN architectures. These approaches suffer from a performance-accuracy tradeoff. This paper proposes a U-Net based architecture that formulates the diagram extraction as a segmentation problem. The proposed method provides similar accuracy with a much faster extraction time as compared to the mentioned state-of-the-art approaches. Further, the segmentation mask in this approach allows the extraction of diagrams of irregular shapes.

Keywords: computer vision, deep-learning, educational data mining, faster-RCNN, figure extraction, image segmentation, real-time document analysis, text extraction, U-Net, YOLO

Procedia PDF Downloads 102
1148 Structuring Highly Iterative Product Development Projects by Using Agile-Indicators

Authors: Guenther Schuh, Michael Riesener, Frederic Diels

Abstract:

Nowadays, manufacturing companies are faced with the challenge of meeting heterogeneous customer requirements in short product life cycles with a variety of product functions. So far, some of the functional requirements remain unknown until late stages of the product development. A way to handle these uncertainties is the highly iterative product development (HIP) approach. By structuring the development project as a highly iterative process, this method provides customer oriented and marketable products. There are first approaches for combined, hybrid models comprising deterministic-normative methods like the Stage-Gate process and empirical-adaptive development methods like SCRUM on a project management level. However, almost unconsidered is the question, which development scopes can preferably be realized with either empirical-adaptive or deterministic-normative approaches. In this context, a development scope constitutes a self-contained section of the overall development objective. Therefore, this paper focuses on a methodology that deals with the uncertainty of requirements within the early development stages and the corresponding selection of the most appropriate development approach. For this purpose, internal influencing factors like a company’s technology ability, the prototype manufacturability and the potential solution space as well as external factors like the market accuracy, relevance and volatility will be analyzed and combined into an Agile-Indicator. The Agile-Indicator is derived in three steps. First of all, it is necessary to rate each internal and external factor in terms of the importance for the overall development task. Secondly, each requirement has to be evaluated for every single internal and external factor appropriate to their suitability for empirical-adaptive development. Finally, the total sums of internal and external side are composed in the Agile-Indicator. Thus, the Agile-Indicator constitutes a company-specific and application-related criterion, on which the allocation of empirical-adaptive and deterministic-normative development scopes can be made. In a last step, this indicator will be used for a specific clustering of development scopes by application of the fuzzy c-means (FCM) clustering algorithm. The FCM-method determines sub-clusters within functional clusters based on the empirical-adaptive environmental impact of the Agile-Indicator. By means of the methodology presented in this paper, it is possible to classify requirements, which are uncertainly carried out by the market, into empirical-adaptive or deterministic-normative development scopes.

Keywords: agile, highly iterative development, agile-indicator, product development

Procedia PDF Downloads 216
1147 An End-to-end Piping and Instrumentation Diagram Information Recognition System

Authors: Taekyong Lee, Joon-Young Kim, Jae-Min Cha

Abstract:

Piping and instrumentation diagram (P&ID) is an essential design drawing describing the interconnection of process equipment and the instrumentation installed to control the process. P&IDs are modified and managed throughout a whole life cycle of a process plant. For the ease of data transfer, P&IDs are generally handed over from a design company to an engineering company as portable document format (PDF) which is hard to be modified. Therefore, engineering companies have to deploy a great deal of time and human resources only for manually converting P&ID images into a computer aided design (CAD) file format. To reduce the inefficiency of the P&ID conversion, various symbols and texts in P&ID images should be automatically recognized. However, recognizing information in P&ID images is not an easy task. A P&ID image usually contains hundreds of symbol and text objects. Most objects are pretty small compared to the size of a whole image and are densely packed together. Traditional recognition methods based on geometrical features are not capable enough to recognize every elements of a P&ID image. To overcome these difficulties, state-of-the-art deep learning models, RetinaNet and connectionist text proposal network (CTPN) were used to build a system for recognizing symbols and texts in a P&ID image. Using the RetinaNet and the CTPN model carefully modified and tuned for P&ID image dataset, the developed system recognizes texts, equipment symbols, piping symbols and instrumentation symbols from an input P&ID image and save the recognition results as the pre-defined extensible markup language format. In the test using a commercial P&ID image, the P&ID information recognition system correctly recognized 97% of the symbols and 81.4% of the texts.

Keywords: object recognition system, P&ID, symbol recognition, text recognition

Procedia PDF Downloads 122
1146 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status

Authors: Rosa Figueroa, Christopher Flores

Abstract:

Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).

Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm

Procedia PDF Downloads 270
1145 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining

Procedia PDF Downloads 193
1144 Psychological Nano-Therapy: A New Method in Family Therapy

Authors: Siamak Samani, Nadereh Sohrabi

Abstract:

Psychological nano-therapy is a new method based on systems theory. According to the theory, systems with severe dysfunctions are resistant to changes. Psychological nano-therapy helps the therapists to break this ice. Two key concepts in psychological nano-therapy are nano-functions and nano-behaviors. The most important step in psychological nano-therapy in family therapy is selecting the most effective nano-function and nano-behavior. The aim of this study was to check the effectiveness of psychological nano-therapy for family therapy. One group pre-test-post-test design (quasi-experimental Design) was applied for research. The sample consisted of ten families with severe marital conflict. The important character of these families was resistance for participating in family therapy. In this study, sending respectful (nano-function) text massages (nano-behavior) with cell phone were applied as a treatment. Cohesion/respect sub scale from self-report family processes scale and family readiness for therapy scale were used to assess all family members in pre-test and post-test. In this study, one of family members was asked to send a respectful text massage to other family members every day for a week. The content of the text massages were selected and checked by therapist. To compare the scores of families in pre-test and post-test paired sample t-test was used. The results of the test showed significant differences in both cohesion/respect score and family readiness for therapy between per-test and post-test. The results revealed that these families have found a better atmosphere for participation in a complete family therapy program. Indeed, this study showed that psychological nano-therapy is an effective method to make family readiness for therapy.

Keywords: family therapy, family conflicts, nano-therapy, family readiness

Procedia PDF Downloads 628
1143 Jalal-Ale-Ahmad and ‘Critical Consciousness’: A Comparative Study

Authors: Zohreh Ramin

Abstract:

One of the most important contributions that Edward Said has had in the realm of critical theory is his insistence on the worldliness of the text and the critic. By this, Said meant that the critic and the text must be considered in their ‘material’ contexts. Foregrounding the substantial role of a critic as embodying what he refers to as ‘critical consciousness’, a true critic, Said maintains, is one who can stand between the ‘dominant culture’ and ‘the totalizing forms of critical systems.’ Considered as one of Iran’s major contemporary intellectuals, Jalal Ale Ahmad is responsible for introducing the idea of ‘Westoxication’ in Iran, constructing a social paradigm of the necessity to return to tradition in contemporary Iran. The present paper intends to study Al-Ahmad’s definition of the orient versus the occident, his criticism of the ‘machination’ of contemporary Iranian society, and his solution to the problem of ‘Westoxication’. The objective of this study is to see whether Ale Ahmad can be considered as embodying the spirit of ‘critical consciousness’ as described by Said as the necessary tool in the hands of an intellectual who is simultaneously attached filitavely to his culture but can detach himself affilitavely through employing critical consciousness.

Keywords: Westoxication, filiative, affiliative, machination

Procedia PDF Downloads 153
1142 Subtitled Based-Approach for Learning Foreign Arabic Language

Authors: Elleuch Imen

Abstract:

In this paper, it propose a new approach for learning Arabic as a foreign language via audio-visual translation, particularly subtitling. The approach consists of developing video sequences appropriate to different levels of learning (from A1 to C2) containing conversations, quizzes, games and others. Each video aims to achieve a specific objective, such as the correct pronunciation of Arabic words, the correct syntactic structuring of Arabic sentences, the recognition of the morphological characteristics of terms and the semantic understanding of statements. The subtitled videos obtained can be incorporated into different Arabic second language learning tools such as Moocs, websites, platforms, etc.

Keywords: arabic foreign language, learning, audio-visuel translation, subtitled videos

Procedia PDF Downloads 31
1141 Ancient Latin Language and Haiku Poetry: A Case Study between Teaching and Translation Studies

Authors: Arianna Sacerdoti

Abstract:

The translation of Haiku Poetry into Latin is fundamentally experimental in nature. One of the first seminal books containing such translations, alongside translations into different modern languages, 'A Piedi Scalzi', was written by Tartamella in 2016. The results of a text-oriented study of this book will be commented upon and analyzed. The author Arianna Sacerdoti made similar translations with high school student. Such an experiment garners interest across a diverse range of disciplines such as teaching, translation studies, and classics reception studies. The methodology employed is text-oriented as the Haiku poem translations will be commented on by considering their relationship with the original. The results of this investigation, conducted within the field of experimental teaching, are expected to confirm the usefulness of this approach to the teaching of Latin and its potential to actively involve students in identifying the diachronic differences between the world of classical antiquity and the contemporary one.

Keywords: ancient latin, Haiku, translation studies, reception of classics

Procedia PDF Downloads 103
1140 Move Analysis of Death Row Statements: An Explanatory Study Applied to Death Row Statements in Texas Department of Criminal Justice Website

Authors: Giya Erina

Abstract:

Linguists have analyzed the rhetorical structure of various forensic genres, but only a few have investigated the complete structure of death row statements. Unlike other forensic text types, such as suicide or ransom notes, the focus of death row statement analysis is not the authenticity or falsity of the text, but its intended meaning and its communicative purpose. As it constitutes their last statement before their execution, there are probably many things that inmates would like to express. This study mainly examines the rhetorical moves of 200 death row statements from the Texas Department of Criminal Justice website using rhetorical move analysis. The rhetorical moves identified in the statements will be classified based on their communicative purpose, and they will be grouped into moves and steps. A move structure will finally be suggested from the most common or characteristic moves and steps, as well as some sub-moves. However, because of some statements’ atypicality, some moves may appear in different parts of the texts or not at all.

Keywords: Death row statements, forensic linguistics, genre analysis, move analysis

Procedia PDF Downloads 271
1139 Financial Reports and Common Ownership: An Analysis of the Mechanisms Common Owners Use to Induce Anti-Competitive Behavior

Authors: Kevin Smith

Abstract:

Publicly traded company in the US are legally obligated to host earnings calls that discuss their most recent financial reports. During these calls, investors are able to ask these companies questions about these financial reports and on the future direction of the company. This paper examines whether common institutional owners use these calls as a way to indirectly signal to companies in their portfolio to not take actions that could hurt the common owner's interests. This paper uses transcripts taken from the earnings calls of the six largest health insurance companies in the US from 2014 to 2019. This data is analyzed using text analysis and sentiment analysis to look for patterns in the statements made by common owners. The analysis found that common owners where more likely to recommend against direct price competition and instead redirect the insurance companies towards more passive actions, like investing in new technologies. This result indicates a mechanism that common owners use to reduce competition in the health insurance market.

Keywords: common ownership, text analysis, sentiment analysis, machine learning

Procedia PDF Downloads 45
1138 A Review of Research on Pre-training Technology for Natural Language Processing

Authors: Moquan Gong

Abstract:

In recent years, with the rapid development of deep learning, pre-training technology for natural language processing has made great progress. The early field of natural language processing has long used word vector methods such as Word2Vec to encode text. These word vector methods can also be regarded as static pre-training techniques. However, this context-free text representation brings very limited improvement to subsequent natural language processing tasks and cannot solve the problem of word polysemy. ELMo proposes a context-sensitive text representation method that can effectively handle polysemy problems. Since then, pre-training language models such as GPT and BERT have been proposed one after another. Among them, the BERT model has significantly improved its performance on many typical downstream tasks, greatly promoting the technological development in the field of natural language processing, and has since entered the field of natural language processing. The era of dynamic pre-training technology. Since then, a large number of pre-trained language models based on BERT and XLNet have continued to emerge, and pre-training technology has become an indispensable mainstream technology in the field of natural language processing. This article first gives an overview of pre-training technology and its development history, and introduces in detail the classic pre-training technology in the field of natural language processing, including early static pre-training technology and classic dynamic pre-training technology; and then briefly sorts out a series of enlightening technologies. Pre-training technology, including improved models based on BERT and XLNet; on this basis, analyze the problems faced by current pre-training technology research; finally, look forward to the future development trend of pre-training technology.

Keywords: natural language processing, pre-training, language model, word vectors

Procedia PDF Downloads 24
1137 StockTwits Sentiment Analysis on Stock Price Prediction

Authors: Min Chen, Rubi Gupta

Abstract:

Understanding and predicting stock market movements is a challenging problem. It is believed stock markets are partially driven by public sentiments, which leads to numerous research efforts to predict stock market trend using public sentiments expressed on social media such as Twitter but with limited success. Recently a microblogging website StockTwits is becoming increasingly popular for users to share their discussions and sentiments about stocks and financial market. In this project, we analyze the text content of StockTwits tweets and extract financial sentiment using text featurization and machine learning algorithms. StockTwits tweets are first pre-processed using techniques including stopword removal, special character removal, and case normalization to remove noise. Features are extracted from these preprocessed tweets through text featurization process using bags of words, N-gram models, TF-IDF (term frequency-inverse document frequency), and latent semantic analysis. Machine learning models are then trained to classify the tweets' sentiment as positive (bullish) or negative (bearish). The correlation between the aggregated daily sentiment and daily stock price movement is then investigated using Pearson’s correlation coefficient. Finally, the sentiment information is applied together with time series stock data to predict stock price movement. The experiments on five companies (Apple, Amazon, General Electric, Microsoft, and Target) in a duration of nine months demonstrate the effectiveness of our study in improving the prediction accuracy.

Keywords: machine learning, sentiment analysis, stock price prediction, tweet processing

Procedia PDF Downloads 126
1136 Shaking the Iceberg: Metaphoric Shifting and Loss in the German Translations of 'The Sun Also Rises'

Authors: Christopher Dick

Abstract:

While the translation of 'literal language' poses numerous challenges for the translator, the translation of 'figurative language' creates even more complicated issues. It has been only in the last several decades that scholars have attempted to propose theories of figurative language translation, including metaphor translation. Even less work has applied these theories to metaphoric translation in literary texts. And almost no work has linked an analysis of metaphors in translation with the recent scholarship on conceptual metaphors. A study of literature in translation must not only examine the inevitable shifts that occur as specific metaphors move from source language to target language but also analyze the ways in which these shifts impact conceptual metaphors and, ultimately, the text as a whole. Doing so contributes to on-going efforts to bridge the sometimes wide gulf between considerations of content and form in literary studies. This paper attempts to add to the body of scholarly literature on metaphor translation and the function of metaphor in a literary text. Specifically, the study examines the metaphoric expressions in Hemingway’s The Sun Also Rises. First, the issue of Hemingway and metaphor is addressed. Next, the study examines the specific metaphors in the original novel in English and the German translations, first in Annemarie Horschitz’s 1928 German version and then in the recent Werner Schmitz 2013 translation. Hemingway’s metaphors, far from being random occurrences of figurative language, are linguistic manifestations of deeper conceptual metaphors that are central to an interpretation of the text. By examining the modifications that are made to these original metaphoric expressions as they are translated into German, one can begin to appreciate the shifts involved with metaphor translation. The translation of Hemingway’s metaphors into German represents significant metaphoric loss and shifting that subsequently shakes the important conceptual metaphors in the novel.

Keywords: Hemingway, Conceptual Metaphor, Translation, Stylistics

Procedia PDF Downloads 329
1135 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 226
1134 A Holistic Workflow Modeling Method for Business Process Redesign

Authors: Heejung Lee

Abstract:

In a highly competitive environment, it becomes more important to shorten the whole business process while delivering or even enhancing the business value to the customers and suppliers. Although the workflow management systems receive much attention for its capacity to practically support the business process enactment, the effective workflow modeling method remain still challenging and the high degree of process complexity makes it more difficult to gain the short lead time. This paper presents a workflow structuring method in a holistic way that can reduce the process complexity using activity-needs and formal concept analysis, which eventually enhances the key performance such as quality, delivery, and cost in business process.

Keywords: workflow management, re-engineering, formal concept analysis, business process

Procedia PDF Downloads 383