Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 933

Search results for: punjabi text

933 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 85
932 Detonalization of Punjabi: Towards a Loss of Linguistic Indigeneity

Authors: Sukhvinder Singh

Abstract:

Punjabi language is related to the languages of New Indo-Aryan group that, in turn, is related to the branch of Indo-European language family. Punjabi language covers the areas of Western part (that is in Pakistan) and Eastern part (the Punjab state, Haryana, Delhi Himachal and J&K) and abroad (particularly Canada, USA, U.K. and Arab Emirates), where it is spoken widely. Besides India and Pakistan, Punjabi is the third language spoken in Canada after English, French having more than one hundred millions speakers worldwide. It is the fourth language spoken in Canada after English, French, and Chinese. It is also being taught as second language in most of the community school of British Columbia. The total number of Punjabi speakers is more than one hundred millions including India, Pakistan and abroad. Punjabi has a long tradition of linguistic tradition. A large number of scholars have studied Punjabi at different linguistic levels. Various studies are devoted to its special phonological characteristics, especially the tone, which has now started disappearing in favour of aspiration, a rare example of a language change in progress in its reversal direction. This process of language change in progress in reversal is dealt with in this paper a change towards a loss of linguistic indigeneity. The tone being a distinctive linguistic feature of Punjabi language is getting lost due to the increasing influence of Hindi and English particularly in the speech Urban Punjabi and Punjabi settled abroad. In this paper, an attempt has been made to discuss the sociolinguistics and sociology of Punjabi language and Punjab to trace the initiation and progression of this change towards a loss of Linguistic Indigeneity.

Keywords: language change in reversal, reaspiration, detonalization, new Indo-Aryan group

Procedia PDF Downloads 81
931 Women Characters in Pakistani Films: A Critical Evaluation

Authors: Ali Arshad

Abstract:

The study examines the depiction of women characters in Urdu and Punjabi films. It is a critical evaluation of forty-eight Pakistani films. It explores the characters of women portrays in Urdu and Punjabi film of Pakistan. Using content analysis as methodology with feminist research that helps to investigate the phenomena and supports the study. Finding of the study shows that women characters in Urdu and Punjabi films are not the reflection of true Pakistani women rather this picture represents a negative image of Pakistani women in viewers mind. These characters don’t address the women’s issues nor do they present the solutions to these problems faced by Pakistani women. The characters of Pakistani women are not free from male prejudice, and these films do not portray the social and political role perform by actual Pakistani women. The analysis shows that the characters of women in Urdu and Punjabi films are based on the assumptions.

Keywords: women, Pakistani, film, characters

Procedia PDF Downloads 175
930 PH.WQT as a Web Quality Model for Websites of Government Domain

Authors: Rupinder Pal Kaur, Vishal Goyal

Abstract:

In this research, a systematic and quantitative engineering-based approach is followed by applying well-known international standards and guidelines to develop a web quality model (PH.WQT- Punjabi and Hindi Website Quality Tester) to measure external quality for websites of government domain that are developed in Punjabi and Hindi. Correspondingly, the model can be used for websites developed in other languages also. The research is valuable to researchers and practitioners interested in designing, implementing and managing websites of government domain Also, by implementing PH.WQT analysis and comparisons among web sites of government domain can be performed in a consistent way.

Keywords: external quality, PH.WQT, indian languages, punjabi and hindi, quality model, websites of government

Procedia PDF Downloads 178
929 Gender Difference in the Use of Request Strategies by Urdu/Punjabi Native Speakers

Authors: Muzaffar Hussain

Abstract:

Requests strategies are considered as a part of the speech acts, which are frequently used in everyday communication. Each language provides speech acts to the speakers; therefore, the selection of appropriate form seems more culture-specific rather than language. The present paper investigates the gender-based difference in the use of request strategies by native speakers of Urdu/Punjabi male and female who are learning English as a second language. The data for the present study were collected from 68 graduate students, who are learning English as an L2 in Pakistan. They were given an online close-ended questionnaire, based on Discourse Completion Test (DCT). After analyzing the data, it was found that the L1 male Urdu/Punjabi speakers were inclined to use more direct request strategies while the female Urdu/Punjabi speakers used indirect request strategies. This paper also found that in some situations female participants used more direct strategies than male participants. The present study concludes that the use of request strategies is influenced by culture, social status, and power distribution in a society.

Keywords: gender variation, request strategies, face-threatening, second language pragmatics, language competence

Procedia PDF Downloads 64
928 Teaching Creative Thinking and Writing to Simultaneous Bilinguals: A Longitudinal Study of 6-7 Years Old English and Punjabi Language Learners

Authors: Hafiz Muhammad Fazalehaq

Abstract:

This paper documents the results of a longitudinal study done on two bilingual children who speak English and Punjabi simultaneously. Their father is a native English speaker whereas their mother speaks Punjabi. Their mother can speak both the languages (English and Punjabi) whereas their father only speaks English. At the age of six, these children have difficulty in creative thinking and of course creative writing. So, the first task for the researcher is to impress and entice the children to think creatively. Various and different methodologies and techniques were used to entice them to start thinking creatively. Creative thinking leads to creative writing. These children were exposed to numerous sources including videos, photographs, texts and audios at first place in order to have a taste of creative genres (stories in this case). The children were encouraged to create their own stories sometimes with photographs and sometimes by using their favorite toys. At a second stage, they were asked to write about an event or incident. After that, they were motivated to create new stories and write them. Length of their creative writing varies from a few sentences to a two standard page. After this six months’ study, the researcher was able to develop a ten steps methodology for creating and improving/enhancing creative thinking and creative writing skills of the subjects understudy. This ten-step methodology entices and motivates the learner to think creatively for producing a creative piece.

Keywords: bilinguals, creative thinking, creative writing, simultaneous bilingual

Procedia PDF Downloads 246
927 Extraction of Text Subtitles in Multimedia Systems

Authors: Amarjit Singh

Abstract:

In this paper, a method for extraction of text subtitles in large video is proposed. The video data needs to be annotated for many multimedia applications. Text is incorporated in digital video for the motive of providing useful information about that video. So need arises to detect text present in video to understanding and video indexing. This is achieved in two steps. First step is text localization and the second step is text verification. The method of text detection can be extended to text recognition which finds applications in automatic video indexing; video annotation and content based video retrieval. The method has been tested on various types of videos.

Keywords: video, subtitles, extraction, annotation, frames

Procedia PDF Downloads 475
926 Urdu Text Extraction Method from Images

Authors: Samabia Tehsin, Sumaira Kausar

Abstract:

Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.

Keywords: caption text, content-based image retrieval, document analysis, text extraction

Procedia PDF Downloads 377
925 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 39
924 Analyzing Use of Figurativeness, Visual Elements, Allegory, Scenic Imagery as Support System in Punjabi Contemporary Theatre for Escaping Censorship

Authors: Shazia Anwer

Abstract:

This paper has discussed the unusual form of resistance in theatre against censorship board in Pakistan. The atypical approach of dramaturgy created massive space for performers and audiences to integrate and communicate. The social and religious absolutes creates suffocation in Pakistani society, strict control over all Fine and Performing Art has made art political, contemporary dramatics has started an amalgamated theatre to avoid censorship. Contemporary Punjabi theatre techniques are directly dependent on human cognition. The idea of indirect thought processing is not unique but dependent on spectators. The paper has provided an account of these techniques and their specific use for conveying specific messages across the audiences. For the Dramaturge of today, theatre space is an expression representing a linguistic formulation that includes qualities of experimental and non-traditional use of classical theatrical space in the context of fulfilling the concept of open theatre. Paper has explained the transformation of the theatrical experience into an event where the actor and the audience are co-existing and co-experiencing the dramatical experience. The denial of the existence of the 4th -Wall made two-way communication possible. This paper has elaborated that the previously marginalized genres such as naach, jugat, miras, are extensively included to counter the censorship board. Figurativeness, visual elements, allegory, scenic imagery are basic support system for contemporary Punjabi theatre. The body of the actor is used as a source for non-verbal communication, and for an escape from traditional theatrical space which by every means has every element that could be controlled and reprimanded by the controlling authority.

Keywords: communication, Punjabi theatre, figurativeness, censorship

Procedia PDF Downloads 18
923 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba

Abstract:

Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 17
922 Reducing Accidents Using Text Stops

Authors: Benish Chaudhry

Abstract:

Most of the accidents these days are occurring because of the ‘text-and-drive’ concept. If we look at the structure of cities in UAE, there are great distances, because of which it is impossible to drive without using or merely checking the cellphone. Moreover, if we look at the road structure, it is almost impossible to stop at a point and text. With the introduction of TEXT STOPs, drivers will be able to stop different stops for a maximum of 1 and a half-minute in order to reply or write a message. They can be introduced at a distance of 10 minutes of driving on the average speed of the road, so the drivers can look forward to a stop and can reply to a text when needed. A user survey indicates that drivers are willing to NOT text-and-drive if they have such a facility available.

Keywords: transport, accidents, urban planning, road planning

Procedia PDF Downloads 240
921 An Ecological Reading of Indian Regional Literature: A Comparative Ecocritical Analysis of Punjabi Poet Shiv Kumar Batalvi and Surjit Patar's Poetry

Authors: Zameerpal Kaur

Abstract:

Ecocriticism comes into existence in 1990s, it tries to explore the relationship of literature with the natural world and further it examines the role that natural surroundings and environment play in the minds of the creative writers during their imagination and creative process. The present study is an attempt to focus on the comparative ecocritical analysis of Shiv Kumar Batalvi and Surjit Patar’s selected poetry in the theoretical framework of ecocriticism in order to shed light on the poet’s vigilant views about the relationship of human life and nature. Shiv Kumar Batalvi is a renowned modern Punjabi poet. He is essentially a poet of nature and love. His opinions towards nature support his position to be considered as a major representative of recent environmental issues and ecocritical concerns in Punjabi literature. He is one of the most outstanding modern Punjabi poets, is endowed with the most artistic temperament in whose poetry nature always has a dominating existence. He seems to consciously portray the scenes of natural surroundings into his poetry; in fact the titles of his poems in themselves signify his love for the nature. Surjit Patar, an imminent modern Punjabi poet tries to present a different picture of nature into his poems; he also uses to write poems about contemporary problems. Surjit Patar’s radical quarrel with the modern cultural context makes him reject all the absolutes and finalities in the form of transcendental reason and religion, history and evolution, he freely writes about the deterioration of nature at selfish materialistic society. He is modern poet who weaves the natural imagery with the syntax of his poems. Patar’s work reflects a universal voice that is dribbled with nuanced humanism and a sense of modernity that seemed neither dated, nor trapped in regional boundaries. Through his poetry he has given a voice to the fragile, disrupting borders, disturbing the status quo. An attempt to analyse the poetic works of above said poets from ecocritical perspective as well as especially focussing on various aspects of ecocriticism like ecocentric ethics, ecoaesthetics, anthropomorphism etc. has been made throughout the comparative study of the selected works.

Keywords: anthropocentrism, degradation, environment and literature, nature

Procedia PDF Downloads 349
920 Structure Analysis of Text-Image Connection in Jalayrid Period Illustrated Manuscripts

Authors: Mahsa Khani Oushani

Abstract:

Text and image are two important elements in the field of Iranian art, the text component and the image component have always been manifested together. The image narrates the text and the text is the factor in the formation of the image and they are closely related to each other. The connection between text and image is an interactive and two-way connection in the tradition of Iranian manuscript arrangement. The interaction between the narrative description and the image scene is the result of a direct and close connection between the text and the image, which in addition to the decorative aspect, also has a descriptive aspect. In this article the connection between the text element and the image element and its adaptation to the theory of Roland Barthes, the structuralism theorist, in this regard will be discussed. This study tends to investigate the question of how the connection between text and image in illustrated manuscripts of the Jalayrid period is defined according to Barthes’ theory. And what kind of proportion has the artist created in the composition between text and image. Based on the results of reviewing the data of this study, it can be inferred that in the Jalayrid period, the image has a reference connection and although it is of major importance on the page, it also maintains a close connection with the text and is placed in a special proportion. It is not necessarily balanced and symmetrical and sometimes uses imbalance for composition. This research has been done by descriptive-analytical method, which has been done by library collection method.

Keywords: structure, text, image, Jalayrid, painter

Procedia PDF Downloads 63
919 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: extraction, max-prod, fuzzy relations, text mining, memberships, classification, memberships, classification

Procedia PDF Downloads 447
918 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar

Abstract:

The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 325
917 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 282
916 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 300
915 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 321
914 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 70
913 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 42
912 Adolescent Obesity Leading to Adulthood Cardiovascular Diseases among Punjabi Population

Authors: Manpreet Kaur, Badaruddoza, Sandeep Kaur Brar

Abstract:

The increasing prevalence of adolescent obesity is one of the major causes to be hypertensive in adulthood. Various statistical methods have been applied to examine the performance of anthropometric indices for the identification of adverse cardiovascular risk profile. The present work was undertaken to determine the significant traditional risk factors through principal component factor analysis (PCFA) among population based Punjabi adolescents aged 10-18 years. Data was collected among adolescent children from different schools situated in urban areas of Punjab, India. Principal component factor analysis (PCFA) was applied to extract orthogonal components from anthropometric and physiometric variables. Association between components were explained by factor loadings. The PCFA extracted four factors, which explained 84.21%, 84.06% and 83.15% of the total variance of the 14 original quantitative traits among boys, girls and combined subjects respectively. Factor 1 has high loading of the traits that reflect adiposity such as waist circumference, BMI and skinfolds among both sexes. However, waist circumference and body mass index are the indicator of abdominal obesity which increases the risk of cardiovascular diseases. The loadings of these two traits have found maximum in girls adolescents (WC=0.924; BMI=0.905). Therefore, factor 1 is the strong indicator of atherosclerosis in adolescents. Factor 2 is predominantly loaded with blood pressures and related traits (SBP, DBP, MBP and pulse rate) which reflect the risk of essential hypertension in adolescent girls and combined subjects, whereas, factor 2 loaded with obesity related traits in boys (weight and hip circumferences). Comparably, factor 3 is loaded with blood pressures in boys and with height and WHR in girls, while factor 4 contains high loading of pulse pressure among boys, girls and combined group of adolescents.

Keywords: adolescent obesity, cvd, hypertension, punjabi population

Procedia PDF Downloads 265
911 Structural Analysis of Kamaluddin Behzad's Works Based on Roland Barthes' Theory of Communication, 'Text and Image'

Authors: Mahsa Khani Oushani, Mohammad Kazem Hasanvand

Abstract:

Text and image have always been two important components in Iranian layout. The interactive connection between text and image has shaped the art of book design with multiple patterns. In this research, first the structure and visual elements in the research data were analyzed and then the position of the text element and the image element in relation to each other based on Roland Barthes theory on the three theories of text and image, were studied and analyzed and the results were compared, and interpreted. The purpose of this study is to investigate the pattern of text and image in the works of Kamaluddin Behzad based on three Roland Barthes communication theories, 1. Descriptive communication, 2. Reference communication, 3. Matched communication. The questions of this research are what is the relationship between text and image in Behzad's works? And how is it defined according to Roland Barthes theory? The method of this research has been done with a structuralist approach with a descriptive-analytical method in a library collection method. The information has been collected in the form of documents (library) and is a tool for collecting online databases. Findings show that the dominant element in Behzad's drawings is with the image and has created a reference relationship in the layout of the drawings, but in some cases it achieves a different relationship that despite the preference of the image on the page, the text is dispersed proportionally on the page and plays a more active role, played within the image. The text and the image support each other equally on the page; Roland Barthes equates this connection.

Keywords: text, image, Kamaluddin Behzad, Roland Barthes, communication theory

Procedia PDF Downloads 41
910 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 34
909 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 176
908 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 390
907 Video Text Information Detection and Localization in Lecture Videos Using Moments

Authors: Belkacem Soundes, Guezouli Larbi

Abstract:

This paper presents a robust and accurate method for text detection and localization over lecture videos. Frame regions are classified into text or background based on visual feature analysis. However, lecture video shows significant degradation mainly related to acquisition conditions, camera motion and environmental changes resulting in low quality videos. Hence, affecting feature extraction and description efficiency. Moreover, traditional text detection methods cannot be directly applied to lecture videos. Therefore, robust feature extraction methods dedicated to this specific video genre are required for robust and accurate text detection and extraction. Method consists of a three-step process: Slide region detection and segmentation; Feature extraction and non-text filtering. For robust and effective features extraction moment functions are used. Two distinct types of moments are used: orthogonal and non-orthogonal. For orthogonal Zernike Moments, both Pseudo Zernike moments are used, whereas for non-orthogonal ones Hu moments are used. Expressivity and description efficiency are given and discussed. Proposed approach shows that in general, orthogonal moments show high accuracy in comparison to the non-orthogonal one. Pseudo Zernike moments are more effective than Zernike with better computation time.

Keywords: text detection, text localization, lecture videos, pseudo zernike moments

Procedia PDF Downloads 32
906 Intonation Salience as an Underframe to Text Intonation Models

Authors: Tatiana Stanchuliak

Abstract:

It is common knowledge that intonation is not laid over a ready text. On the contrary, intonation forms and accompanies the text on the level of its birth in the speaker’s mind. As a result, intonation plays one of the fundamental roles in the process of transferring a thought into external speech. Intonation structure can highlight the semantic significance of textual elements and become a ranging mark in understanding the information structure of the text. Intonation functions by means of prosodic characteristics, one of which is intonation salience, whose function in texts results in making some textual elements more prominent than others. This function of intonation, therefore, performs as organizing. It helps to form the frame of key elements of the text. The study under consideration made an attempt to look into the inner nature of salience and create a sort of a text intonation model. This general goal brought to some more specific intermediate results. First, there were established degrees of salience on the level of the smallest semantic element - intonation group, as well as prosodic means of creating salience, were examined. Second, the most frequent combinations of prosodic means made it possible to distinguish patterns of salience, which then became constituent elements of a text intonation model. Third, the analysis of the predicate structure allowed to divide the whole text into smaller parts, or units, which performed a specific function in the developing of the general communicative intention. It appeared that such units can be found in any text and they have common characteristics of their intonation arrangement. These findings are certainly very important both for the theory of intonation and their practical application.

Keywords: accentuation , inner speech, intention, intonation, intonation functions, models, patterns, predicate, salience, semantics, sentence stress, text

Procedia PDF Downloads 160
905 Distorted Document Images Dataset for Text Detection and Recognition

Authors: Ilia Zharikov, Philipp Nikitin, Ilia Vasiliev, Vladimir Dokholyan

Abstract:

With the increasing popularity of document analysis and recognition systems, text detection (TD) and optical character recognition (OCR) in document images become challenging tasks. However, according to our best knowledge, no publicly available datasets for these particular problems exist. In this paper, we introduce a Distorted Document Images dataset (DDI-100) and provide a detailed analysis of the DDI-100 in its current state. To create the dataset we collected 7000 unique document pages, and extend it by applying different types of distortions and geometric transformations. In total, DDI-100 contains more than 100,000 document images together with binary text masks, text and character locations in terms of bounding boxes. We also present an analysis of several state-of-the-art TD and OCR approaches on the presented dataset. Lastly, we demonstrate the usefulness of DDI-100 to improve accuracy and stability of the considered TD and OCR models.

Keywords: document analysis, open dataset, optical character recognition, text detection

Procedia PDF Downloads 49
904 Towards Logical Inference for the Arabic Question-Answering

Authors: Wided Bakari, Patrice Bellot, Omar Trigui, Mahmoud Neji

Abstract:

This article constitutes an opening to think of the modeling and analysis of Arabic texts in the context of a question-answer system. It is a question of exceeding the traditional approaches focused on morphosyntactic approaches. Furthermore, we present a new approach that analyze a text in order to extract correct answers then transform it to logical predicates. In addition, we would like to represent different levels of information within a text to answer a question and choose an answer among several proposed. To do so, we transform both the question and the text into logical forms. Then, we try to recognize all entailment between them. The results of recognizing the entailment are a set of text sentences that can implicate the user’s question. Our work is now concentrated on an implementation step in order to develop a system of question-answering in Arabic using techniques to recognize textual implications. In this context, the extraction of text features (keywords, named entities, and relationships that link them) is actually considered the first step in our process of text modeling. The second one is the use of techniques of textual implication that relies on the notion of inference and logic representation to extract candidate answers. The last step is the extraction and selection of the desired answer.

Keywords: NLP, Arabic language, question-answering, recognition text entailment, logic forms

Procedia PDF Downloads 224