Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1315

Search results for: punjabi text

1315 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 316

1314 Detonalization of Punjabi: Towards a Loss of Linguistic Indigeneity

Authors: Sukhvinder Singh

Abstract:

Punjabi language is related to the languages of New Indo-Aryan group that, in turn, is related to the branch of Indo-European language family. Punjabi language covers the areas of Western part (that is in Pakistan) and Eastern part (the Punjab state, Haryana, Delhi Himachal and J&K) and abroad (particularly Canada, USA, U.K. and Arab Emirates), where it is spoken widely. Besides India and Pakistan, Punjabi is the third language spoken in Canada after English, French having more than one hundred millions speakers worldwide. It is the fourth language spoken in Canada after English, French, and Chinese. It is also being taught as second language in most of the community school of British Columbia. The total number of Punjabi speakers is more than one hundred millions including India, Pakistan and abroad. Punjabi has a long tradition of linguistic tradition. A large number of scholars have studied Punjabi at different linguistic levels. Various studies are devoted to its special phonological characteristics, especially the tone, which has now started disappearing in favour of aspiration, a rare example of a language change in progress in its reversal direction. This process of language change in progress in reversal is dealt with in this paper a change towards a loss of linguistic indigeneity. The tone being a distinctive linguistic feature of Punjabi language is getting lost due to the increasing influence of Hindi and English particularly in the speech Urban Punjabi and Punjabi settled abroad. In this paper, an attempt has been made to discuss the sociolinguistics and sociology of Punjabi language and Punjab to trace the initiation and progression of this change towards a loss of Linguistic Indigeneity.

Keywords: language change in reversal, reaspiration, detonalization, new Indo-Aryan group

Procedia PDF Downloads 169

1313 Women Characters in Pakistani Films: A Critical Evaluation

Authors: Ali Arshad

Abstract:

The study examines the depiction of women characters in Urdu and Punjabi films. It is a critical evaluation of forty-eight Pakistani films. It explores the characters of women portrays in Urdu and Punjabi film of Pakistan. Using content analysis as methodology with feminist research that helps to investigate the phenomena and supports the study. Finding of the study shows that women characters in Urdu and Punjabi films are not the reflection of true Pakistani women rather this picture represents a negative image of Pakistani women in viewers mind. These characters don’t address the women’s issues nor do they present the solutions to these problems faced by Pakistani women. The characters of Pakistani women are not free from male prejudice, and these films do not portray the social and political role perform by actual Pakistani women. The analysis shows that the characters of women in Urdu and Punjabi films are based on the assumptions.

Keywords: women, Pakistani, film, characters

Procedia PDF Downloads 298

1312 PH.WQT as a Web Quality Model for Websites of Government Domain

Authors: Rupinder Pal Kaur, Vishal Goyal

Abstract:

In this research, a systematic and quantitative engineering-based approach is followed by applying well-known international standards and guidelines to develop a web quality model (PH.WQT- Punjabi and Hindi Website Quality Tester) to measure external quality for websites of government domain that are developed in Punjabi and Hindi. Correspondingly, the model can be used for websites developed in other languages also. The research is valuable to researchers and practitioners interested in designing, implementing and managing websites of government domain Also, by implementing PH.WQT analysis and comparisons among web sites of government domain can be performed in a consistent way.

Keywords: external quality, PH.WQT, indian languages, punjabi and hindi, quality model, websites of government

Procedia PDF Downloads 299

1311 Gender Difference in the Use of Request Strategies by Urdu/Punjabi Native Speakers

Authors: Muzaffar Hussain

Abstract:

Requests strategies are considered as a part of the speech acts, which are frequently used in everyday communication. Each language provides speech acts to the speakers; therefore, the selection of appropriate form seems more culture-specific rather than language. The present paper investigates the gender-based difference in the use of request strategies by native speakers of Urdu/Punjabi male and female who are learning English as a second language. The data for the present study were collected from 68 graduate students, who are learning English as an L2 in Pakistan. They were given an online close-ended questionnaire, based on Discourse Completion Test (DCT). After analyzing the data, it was found that the L1 male Urdu/Punjabi speakers were inclined to use more direct request strategies while the female Urdu/Punjabi speakers used indirect request strategies. This paper also found that in some situations female participants used more direct strategies than male participants. The present study concludes that the use of request strategies is influenced by culture, social status, and power distribution in a society.

Keywords: gender variation, request strategies, face-threatening, second language pragmatics, language competence

Procedia PDF Downloads 182

1310 Teaching Creative Thinking and Writing to Simultaneous Bilinguals: A Longitudinal Study of 6-7 Years Old English and Punjabi Language Learners

Authors: Hafiz Muhammad Fazalehaq

Abstract:

This paper documents the results of a longitudinal study done on two bilingual children who speak English and Punjabi simultaneously. Their father is a native English speaker whereas their mother speaks Punjabi. Their mother can speak both the languages (English and Punjabi) whereas their father only speaks English. At the age of six, these children have difficulty in creative thinking and of course creative writing. So, the first task for the researcher is to impress and entice the children to think creatively. Various and different methodologies and techniques were used to entice them to start thinking creatively. Creative thinking leads to creative writing. These children were exposed to numerous sources including videos, photographs, texts and audios at first place in order to have a taste of creative genres (stories in this case). The children were encouraged to create their own stories sometimes with photographs and sometimes by using their favorite toys. At a second stage, they were asked to write about an event or incident. After that, they were motivated to create new stories and write them. Length of their creative writing varies from a few sentences to a two standard page. After this six months’ study, the researcher was able to develop a ten steps methodology for creating and improving/enhancing creative thinking and creative writing skills of the subjects understudy. This ten-step methodology entices and motivates the learner to think creatively for producing a creative piece.

Keywords: bilinguals, creative thinking, creative writing, simultaneous bilingual

Procedia PDF Downloads 344

1309 Analyzing Use of Figurativeness, Visual Elements, Allegory, Scenic Imagery as Support System in Punjabi Contemporary Theatre for Escaping Censorship

Authors: Shazia Anwer

Abstract:

This paper has discussed the unusual form of resistance in theatre against censorship board in Pakistan. The atypical approach of dramaturgy created massive space for performers and audiences to integrate and communicate. The social and religious absolutes creates suffocation in Pakistani society, strict control over all Fine and Performing Art has made art political, contemporary dramatics has started an amalgamated theatre to avoid censorship. Contemporary Punjabi theatre techniques are directly dependent on human cognition. The idea of indirect thought processing is not unique but dependent on spectators. The paper has provided an account of these techniques and their specific use for conveying specific messages across the audiences. For the Dramaturge of today, theatre space is an expression representing a linguistic formulation that includes qualities of experimental and non-traditional use of classical theatrical space in the context of fulfilling the concept of open theatre. Paper has explained the transformation of the theatrical experience into an event where the actor and the audience are co-existing and co-experiencing the dramatical experience. The denial of the existence of the 4th -Wall made two-way communication possible. This paper has elaborated that the previously marginalized genres such as naach, jugat, miras, are extensively included to counter the censorship board. Figurativeness, visual elements, allegory, scenic imagery are basic support system for contemporary Punjabi theatre. The body of the actor is used as a source for non-verbal communication, and for an escape from traditional theatrical space which by every means has every element that could be controlled and reprimanded by the controlling authority.

Keywords: communication, Punjabi theatre, figurativeness, censorship

Procedia PDF Downloads 131

1308 Extraction of Text Subtitles in Multimedia Systems

Authors: Amarjit Singh

Abstract:

In this paper, a method for extraction of text subtitles in large video is proposed. The video data needs to be annotated for many multimedia applications. Text is incorporated in digital video for the motive of providing useful information about that video. So need arises to detect text present in video to understanding and video indexing. This is achieved in two steps. First step is text localization and the second step is text verification. The method of text detection can be extended to text recognition which finds applications in automatic video indexing; video annotation and content based video retrieval. The method has been tested on various types of videos.

Keywords: video, subtitles, extraction, annotation, frames

Procedia PDF Downloads 596

1307 A Summary-Based Text Classification Model for Graph Attention Networks

Authors: Shuo Liu

Abstract:

In Chinese text classification tasks, redundant words and phrases can interfere with the formation of extracted and analyzed text information, leading to a decrease in the accuracy of the classification model. To reduce irrelevant elements, extract and utilize text content information more efficiently and improve the accuracy of text classification models. In this paper, the text in the corpus is first extracted using the TextRank algorithm for abstraction, the words in the abstract are used as nodes to construct a text graph, and then the graph attention network (GAT) is used to complete the task of classifying the text. Testing on a Chinese dataset from the network, the classification accuracy was improved over the direct method of generating graph structures using text.

Keywords: Chinese natural language processing, text classification, abstract extraction, graph attention network

Procedia PDF Downloads 92

1306 Urdu Text Extraction Method from Images

Authors: Samabia Tehsin, Sumaira Kausar

Abstract:

Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.

Keywords: caption text, content-based image retrieval, document analysis, text extraction

Procedia PDF Downloads 507

1305 Small Text Extraction from Documents and Chart Images

Authors: Rominkumar Busa, Shahira K. C., Lijiya A.

Abstract:

Text recognition is an important area in computer vision which deals with detecting and recognising text from an image. The Optical Character Recognition (OCR) is a saturated area these days and with very good text recognition accuracy. However the same OCR methods when applied on text with small font sizes like the text data of chart images, the recognition rate is less than 30%. In this work, aims to extract small text in images using the deep learning model, CRNN with CTC loss. The text recognition accuracy is found to improve by applying image enhancement by super resolution prior to CRNN model. We also observe the text recognition rate further increases by 18% by applying the proposed method, which involves super resolution and character segmentation followed by CRNN with CTC loss. The efficiency of the proposed method shows that further pre-processing on chart image text and other small text images will improve the accuracy further, thereby helping text extraction from chart images.

Keywords: small text extraction, OCR, scene text recognition, CRNN

Procedia PDF Downloads 120

1304 An Ecological Reading of Indian Regional Literature: A Comparative Ecocritical Analysis of Punjabi Poet Shiv Kumar Batalvi and Surjit Patar's Poetry

Authors: Zameerpal Kaur

Abstract:

Ecocriticism comes into existence in 1990s, it tries to explore the relationship of literature with the natural world and further it examines the role that natural surroundings and environment play in the minds of the creative writers during their imagination and creative process. The present study is an attempt to focus on the comparative ecocritical analysis of Shiv Kumar Batalvi and Surjit Patar’s selected poetry in the theoretical framework of ecocriticism in order to shed light on the poet’s vigilant views about the relationship of human life and nature. Shiv Kumar Batalvi is a renowned modern Punjabi poet. He is essentially a poet of nature and love. His opinions towards nature support his position to be considered as a major representative of recent environmental issues and ecocritical concerns in Punjabi literature. He is one of the most outstanding modern Punjabi poets, is endowed with the most artistic temperament in whose poetry nature always has a dominating existence. He seems to consciously portray the scenes of natural surroundings into his poetry; in fact the titles of his poems in themselves signify his love for the nature. Surjit Patar, an imminent modern Punjabi poet tries to present a different picture of nature into his poems; he also uses to write poems about contemporary problems. Surjit Patar’s radical quarrel with the modern cultural context makes him reject all the absolutes and finalities in the form of transcendental reason and religion, history and evolution, he freely writes about the deterioration of nature at selfish materialistic society. He is modern poet who weaves the natural imagery with the syntax of his poems. Patar’s work reflects a universal voice that is dribbled with nuanced humanism and a sense of modernity that seemed neither dated, nor trapped in regional boundaries. Through his poetry he has given a voice to the fragile, disrupting borders, disturbing the status quo. An attempt to analyse the poetic works of above said poets from ecocritical perspective as well as especially focussing on various aspects of ecocriticism like ecocentric ethics, ecoaesthetics, anthropomorphism etc. has been made throughout the comparative study of the selected works.

Keywords: anthropocentrism, degradation, environment and literature, nature

Procedia PDF Downloads 461

1303 Text Data Preprocessing Library: Bilingual Approach

Authors: Kabil Boukhari

Abstract:

In the context of information retrieval, the selection of the most relevant words is a very important step. In fact, the text cleaning allows keeping only the most representative words for a better use. In this paper, we propose a library for the purpose text preprocessing within an implemented application to facilitate this task. This study has two purposes. The first, is to present the related work of the various steps involved in text preprocessing, presenting the segmentation, stemming and lemmatization algorithms that could be efficient in the rest of study. The second, is to implement a developed tool for text preprocessing in French and English. This library accepts unstructured text as input and provides the preprocessed text as output, based on a set of rules and on a base of stop words for both languages. The proposed library has been made on diﬀerent corpora and gave an interesting result.

Keywords: text preprocessing, segmentation, knowledge extraction, normalization, text generation, information retrieval

Procedia PDF Downloads 90

1302 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 162

1301 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba

Abstract:

Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 210

1300 On-Road Text Detection Platform for Driver Assistance Systems

Authors: Guezouli Larbi, Belkacem Soundes

Abstract:

The automation of the text detection process can help the human in his driving task. Its application can be very useful to help drivers to have more information about their environment by facilitating the reading of road signs such as directional signs, events, stores, etc. In this paper, a system consisting of two stages has been proposed. In the first one, we used pseudo-Zernike moments to pinpoint areas of the image that may contain text. The architecture of this part is based on three main steps, region of interest (ROI) detection, text localization, and non-text region filtering. Then, in the second step, we present a convolutional neural network architecture (On-Road Text Detection Network - ORTDN) which is considered a classification phase. The results show that the proposed framework achieved ≈ 35 fps and an mAP of ≈ 90%, thus a low computational time with competitive accuracy.

Keywords: text detection, CNN, PZM, deep learning

Procedia PDF Downloads 79

1299 Twenty-Five Polymorphic Microsatellite Loci Used To Genotype Some Camel Types and Subtypes From Sudan, Qatar, Chad, And Somalia

Authors: Wathig Hashim Mohamed Ibrahim

Abstract:

Twenty Five polymorphic microsatellite out of 50 Loci were used to genotype some camel (Camelus dromedarius) types and subtypes in Sudan (Naylawi, Shanapla, Lahawi, Kinani, Rashaydi, Bani-Aamir, Annafi, Bishari Shallagyai and Bishari Arririt) and that from Qatar (OmmaniHJ, OmmaniKH, Majaheem, Pakistani Sindi, Pakistani Punjabi and Pakistani) and for comparative; one type from Somalia (Aarhou) and another from Chad (Spotted) were investigated. The highest number of alleles were 23 in Locus CVRL 01, and lowest were 2 in YWLL 59. The observed heterozygosity (Hobs) were 0.950 and 0.049 for VOLP08 and YWLL09, respectively, while the expected heterozygosity (HExp) were 0.915 and 0.362 for Locus VOLP67 and YWLL58, respectively, and the HExp mean was 0.7378. Polymorphic Information Content (PIC) ranged between 0.907 - 0.345 in Locus VOLP67 and YWLL58, and the PIC mean was 0.7002. The genetic distance ranged between 0.545 – 0.098 for Shallagyai (Bishari subtype) – Pakistani Sindi subtype and between Annafi - Rashaydi, respectively. The genetic distance between spotted and all types ranged between 0.223 with Arririt (Bishari subtype) and 0.463 with Punjabi (Pakistani subtype) that found in Qatar, while all types with Aarhou ranged between 0.215 for Arririt and 0.469 with Punjabi (Pakistani subtype). The dondrogram shows that there is a relationship between the genetic makeup and geographical distributions and also between the genetic makeup and phenotypic characteristic. Individual assignment was calculated, 46.62% correctly assigned and 46.87% quality index. Hardy Weinberg Equivalent (HWE) was also calculated. Key words: Camel, genotype, polymorphic microsatellite

Keywords: camel, genotype, polymorphic microsatellite, types and subtypes

Procedia PDF Downloads 76

1298 Reducing Accidents Using Text Stops

Authors: Benish Chaudhry

Abstract:

Most of the accidents these days are occurring because of the ‘text-and-drive’ concept. If we look at the structure of cities in UAE, there are great distances, because of which it is impossible to drive without using or merely checking the cellphone. Moreover, if we look at the road structure, it is almost impossible to stop at a point and text. With the introduction of TEXT STOPs, drivers will be able to stop different stops for a maximum of 1 and a half-minute in order to reply or write a message. They can be introduced at a distance of 10 minutes of driving on the average speed of the road, so the drivers can look forward to a stop and can reply to a text when needed. A user survey indicates that drivers are willing to NOT text-and-drive if they have such a facility available.

Keywords: transport, accidents, urban planning, road planning

Procedia PDF Downloads 389

1297 Structure Analysis of Text-Image Connection in Jalayrid Period Illustrated Manuscripts

Authors: Mahsa Khani Oushani

Abstract:

Text and image are two important elements in the field of Iranian art, the text component and the image component have always been manifested together. The image narrates the text and the text is the factor in the formation of the image and they are closely related to each other. The connection between text and image is an interactive and two-way connection in the tradition of Iranian manuscript arrangement. The interaction between the narrative description and the image scene is the result of a direct and close connection between the text and the image, which in addition to the decorative aspect, also has a descriptive aspect. In this article the connection between the text element and the image element and its adaptation to the theory of Roland Barthes, the structuralism theorist, in this regard will be discussed. This study tends to investigate the question of how the connection between text and image in illustrated manuscripts of the Jalayrid period is defined according to Barthes’ theory. And what kind of proportion has the artist created in the composition between text and image. Based on the results of reviewing the data of this study, it can be inferred that in the Jalayrid period, the image has a reference connection and although it is of major importance on the page, it also maintains a close connection with the text and is placed in a special proportion. It is not necessarily balanced and symmetrical and sometimes uses imbalance for composition. This research has been done by descriptive-analytical method, which has been done by library collection method.

Keywords: structure, text, image, Jalayrid, painter

Procedia PDF Downloads 227

1296 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: extraction, max-prod, fuzzy relations, text mining, memberships, classification, memberships, classification

Procedia PDF Downloads 572

1295 Mask-Prompt-Rerank: An Unsupervised Method for Text Sentiment Transfer

Authors: Yufen Qin

Abstract:

Text sentiment transfer is an important branch of text style transfer. The goal is to generate text with another sentiment attribute based on a text with a specific sentiment attribute while maintaining the content and semantic information unrelated to sentiment unchanged in the process. There are currently two main challenges in this field: no parallel corpus and text attribute entanglement. In response to the above problems, this paper proposed a novel solution: Mask-Prompt-Rerank. Use the method of masking the sentiment words and then using prompt regeneration to transfer the sentence sentiment. Experiments on two sentiment benchmark datasets and one formality transfer benchmark dataset show that this approach makes the performance of small pre-trained language models comparable to that of the most advanced large models, while consuming two orders of magnitude less computing and memory.

Keywords: language model, natural language processing, prompt, text sentiment transfer

Procedia PDF Downloads 75

1294 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech

Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori

Abstract:

Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.

Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing

Procedia PDF Downloads 123

1293 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar

Abstract:

The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 441

1292 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 459

1291 Adolescent Obesity Leading to Adulthood Cardiovascular Diseases among Punjabi Population

Authors: Manpreet Kaur, Badaruddoza, Sandeep Kaur Brar

Abstract:

The increasing prevalence of adolescent obesity is one of the major causes to be hypertensive in adulthood. Various statistical methods have been applied to examine the performance of anthropometric indices for the identification of adverse cardiovascular risk profile. The present work was undertaken to determine the significant traditional risk factors through principal component factor analysis (PCFA) among population based Punjabi adolescents aged 10-18 years. Data was collected among adolescent children from different schools situated in urban areas of Punjab, India. Principal component factor analysis (PCFA) was applied to extract orthogonal components from anthropometric and physiometric variables. Association between components were explained by factor loadings. The PCFA extracted four factors, which explained 84.21%, 84.06% and 83.15% of the total variance of the 14 original quantitative traits among boys, girls and combined subjects respectively. Factor 1 has high loading of the traits that reflect adiposity such as waist circumference, BMI and skinfolds among both sexes. However, waist circumference and body mass index are the indicator of abdominal obesity which increases the risk of cardiovascular diseases. The loadings of these two traits have found maximum in girls adolescents (WC=0.924; BMI=0.905). Therefore, factor 1 is the strong indicator of atherosclerosis in adolescents. Factor 2 is predominantly loaded with blood pressures and related traits (SBP, DBP, MBP and pulse rate) which reflect the risk of essential hypertension in adolescent girls and combined subjects, whereas, factor 2 loaded with obesity related traits in boys (weight and hip circumferences). Comparably, factor 3 is loaded with blood pressures in boys and with height and WHR in girls, while factor 4 contains high loading of pulse pressure among boys, girls and combined group of adolescents.

Keywords: adolescent obesity, cvd, hypertension, punjabi population

Procedia PDF Downloads 368

1290 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 62

1289 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 418

1288 Perceiving Text-Worlds as a Cognitive Mechanism to Understand Surah Al-Kahf

Authors: Awatef Boubakri, Khaled Jebahi

Abstract:

Using Text World Theory (TWT), we attempted to understand how mental representations (text worlds) and perceptions can be construed by readers of Quranic texts. To this end, Surah Al-Kahf was purposefully selected given the fact that while each of its stories is narrated, different levels of discourse intervene, which might result in a confused reader who might find it hard to keep track of which discourse he or she is processing. This surah was studied using specifically-designed text-world diagrams. The findings suggest that TWT can be used to help solve problems of ambiguity at the level of discourse in Quranic texts and to help construct a thinking reader whose cognitive constructs (text worlds / mental representations) are built through reflecting on the various and often changing components of discourse world, text world, and sub-worlds.

Keywords: Al-Kahf, Surah, cognitive, processing, discourse

Procedia PDF Downloads 81

1287 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 453

1286 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 257