Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2634

Search results for: Arabic text classification

2634 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 282
2633 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail

Abstract:

An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 300
2632 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification

Authors: A. Elsehemy, M. Abdeen , T. Nazmy

Abstract:

Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.

Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology

Procedia PDF Downloads 413
2631 An Enhanced Support Vector Machine Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Authors: Gehad S. Kaseb, Mona F. Ahmed

Abstract:

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. Few studies apply SA to Arabic dialects. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-AATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.

Keywords: Arabic, classification, sentiment analysis, tweets

Procedia PDF Downloads 30
2630 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey

Abstract:

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: extraction, max-prod, fuzzy relations, text mining, memberships, classification, memberships, classification

Procedia PDF Downloads 447
2629 Towards Logical Inference for the Arabic Question-Answering

Authors: Wided Bakari, Patrice Bellot, Omar Trigui, Mahmoud Neji

Abstract:

This article constitutes an opening to think of the modeling and analysis of Arabic texts in the context of a question-answer system. It is a question of exceeding the traditional approaches focused on morphosyntactic approaches. Furthermore, we present a new approach that analyze a text in order to extract correct answers then transform it to logical predicates. In addition, we would like to represent different levels of information within a text to answer a question and choose an answer among several proposed. To do so, we transform both the question and the text into logical forms. Then, we try to recognize all entailment between them. The results of recognizing the entailment are a set of text sentences that can implicate the user’s question. Our work is now concentrated on an implementation step in order to develop a system of question-answering in Arabic using techniques to recognize textual implications. In this context, the extraction of text features (keywords, named entities, and relationships that link them) is actually considered the first step in our process of text modeling. The second one is the use of techniques of textual implication that relies on the notion of inference and logic representation to extract candidate answers. The last step is the extraction and selection of the desired answer.

Keywords: NLP, Arabic language, question-answering, recognition text entailment, logic forms

Procedia PDF Downloads 224
2628 Motion Effects of Arabic Typography on Screen-Based Media

Authors: Ibrahim Hassan

Abstract:

Motion typography is one of the most important types of visual communication based on display. Through the digital display media, we can control the text properties (size, direction, thickness, color, etc.). The use of motion typography in visual communication made it have several images. We need to adjust the terminology and clarify the different differences between them, so relying on the word motion typography -considered a general term- is not enough to separate the different communicative functions of the moving text. In this paper, we discuss the different effects of motion typography on Arabic writing and how we can achieve harmony between the movement and the letterform, and we will, during our experiments, present a new type of text movement.

Keywords: Arabic typography, motion typography, kinetic typography, fluid typography, temporal typography

Procedia PDF Downloads 4
2627 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 321
2626 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)

Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss

Abstract:

In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.

Keywords: recognition, handwriting, Arabic text, HMMs, embedded training

Procedia PDF Downloads 243
2625 Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer

Procedia PDF Downloads 182
2624 A New Scheme for Chain Code Normalization in Arabic and Farsi Scripts

Authors: Reza Shakoori

Abstract:

This paper presents a structural correction of Arabic and Persian strokes using manipulation of their chain codes in order to improve the rate and performance of Persian and Arabic handwritten word recognition systems. It collects pure and effective features to represent a character with one consolidated feature vector and reduces variations in order to decrease the number of training samples and increase the chance of successful classification. Our results also show that how the proposed approaches can simplify classification and consequently recognition by reducing variations and possible noises on the chain code by keeping orientation of characters and their backbone structures.

Keywords: Arabic, chain code normalization, OCR systems, image processing

Procedia PDF Downloads 297
2623 The Effects of Watching Text-Relevant Video Segments with/without Subtitles on Vocabulary Development of Arabic as a Foreign Language Learners

Authors: Amirreza Karami, Hawraa Nafea Hameed Alzouwain, Freddie A. Bowles

Abstract:

This study investigates the effects of watching text-relevant video segments with/without subtitles on vocabulary development of Arabic as a Foreign Language (AFL) learners. The participants of the study were assigned to two groups: one control group and one experimental group. The control group received no video-based instruction while the experimental group watched a text-relevant video segment in three stages: pre, while, and post-instruction. The preliminary results of the pre-test and post-test show that watching text-relevant video segments through following a pre-while-post procedure can help the vocabulary development of AFL learners more than non-video-based instruction.

Keywords: text-relevant video segments, vocabulary development, Arabic as a Foreign Language, AFL, pre-while-post instruction

Procedia PDF Downloads 35
2622 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 85
2621 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 39
2620 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine

Authors: C. Brierley, H. El-Farahaty, A. Farhan

Abstract:

The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.

Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora

Procedia PDF Downloads 51
2619 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 250
2618 The Arabic Literary Text, between Proficiency and Pedagogy

Authors: Abdul Rahman M. Chamseddine, Mahmoud El-ashiri

Abstract:

In the field of language teaching, communication skills are essential for the learner to achieve, however, these skills, in general, might not support the comprehension of some texts of literary or artistic nature like poetry. Understanding sentences and expressions is not enough to understand a poem; other skills are needed in order to understand the special structure of a text which literary meaning is inapprehensible even when the lingual meaning is well comprehended. And then there is the need for many other components that surpass one text to other similar texts that can be understood through solid traditions, which do not form an obstacle in the face of change and progress. This is not exclusive to texts that are classified as a literary but it is also the same with some daily short phrases and indicatively charged expressions that can be classified as literary or bear a taste of literary nature.. it can be found in Newpapers’ titles, TV news reports, and maybe football commentaries… the need to understand this special lingual use – described as literary – is highly important to understand this discourse that can be generally classified as very far from literature. This work will try to explore the role of the literary text in the language class and the way it is being covered or dealt with throughout all levels of acquiring proficiency. It will also attempt to survery the position of the literary text in some of the most important books for teaching Arabic around the world. The same way grammar is needed to understand the language, another (literary) grammar is also needed for understanding literature.

Keywords: language teaching, Arabic, literature, pedagogy, language proficiency

Procedia PDF Downloads 171
2617 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 181
2616 Preserving Digital Arabic Text Integrity Using Blockchain Technology

Authors: Zineb Touati Hamad, Mohamed Ridda Laouar, Issam Bendib

Abstract:

With the massive development of technology today, the Arabic language has gained a prominent position among the languages most used for writing articles, expressing opinions, and also for citing in many websites, defying its growing sensitivity in terms of structure, language skills, diacritics, writing methods, etc. In the context of the spread of the Arabic language, the Holy Quran represents the most prevalent Arabic text today in many applications and websites for citation purposes or for the reading and learning rituals. The Quranic verses / surahs are published quickly and without cost, which may cause great concern to ensure the safety of the content from tampering and alteration. To protect the content of texts from distortion, it is necessary to refer to the original database and conduct a comparison process to extract the percentage of distortion. The disadvantage of this method is that it takes time, in addition to the lack of any guarantee on the integrity of the database itself as it belongs to one central party. Blockchain technology today represents the best way to maintain immutable content. Blockchain is a distributed database that stores information in blocks linked to each other through encryption, where the modification of each block can be easily known. To exploit these advantages, we seek in this paper to justify the use of this technique in preserving the integrity of Arabic texts sensitive to change by building a decentralized framework to authenticate and verify the integrity of the digital Quranic verses/surahs spread on websites.

Keywords: arabic text, authentication, blockchain, integrity, quran, verification

Procedia PDF Downloads 12
2615 Investigating the Influences of Long-Term, as Compared to Short-Term, Phonological Memory on the Word Recognition Abilities of Arabic Readers vs. Arabic Native Speakers: A Word-Recognition Study

Authors: Insiya Bhalloo

Abstract:

It is quite common in the Muslim faith for non-Arabic speakers to be able to convert written Arabic, especially Quranic Arabic, into a phonological code without significant semantic or syntactic knowledge. This is due to prior experience learning to read the Quran (a religious text written in Classical Arabic), from a very young age such as via enrolment in Quranic Arabic classes. As compared to native speakers of Arabic, these Arabic readers do not have a comprehensive morpho-syntactic knowledge of the Arabic language, nor can understand, or engage in Arabic conversation. The study seeks to investigate whether mere phonological experience (as indicated by the Arabic readers’ experience with Arabic phonology and the sound-system) is sufficient to cause phonological-interference during word recognition of previously-heard words, despite the participants’ non-native status. Both native speakers of Arabic and non-native speakers of Arabic, i.e., those individuals that learned to read the Quran from a young age, will be recruited. Each experimental session will include two phases: An exposure phase and a test phase. During the exposure phase, participants will be presented with Arabic words (n=40) on a computer screen. Half of these words will be common words found in the Quran while the other half will be words commonly found in Modern Standard Arabic (MSA) but either non-existent or prevalent at a significantly lower frequency within the Quran. During the test phase, participants will then be presented with both familiar (n = 20; i.e., those words presented during the exposure phase) and novel Arabic words (n = 20; i.e., words not presented during the exposure phase. ½ of these presented words will be common Quranic Arabic words and the other ½ will be common MSA words but not Quranic words. Moreover, ½ the Quranic Arabic and MSA words presented will be comprised of nouns, while ½ the Quranic Arabic and MSA will be comprised of verbs, thereby eliminating word-processing issues affected by lexical category. Participants will then determine if they had seen that word during the exposure phase. This study seeks to investigate whether long-term phonological memory, such as via childhood exposure to Quranic Arabic orthography, has a differential effect on the word-recognition capacities of native Arabic speakers and Arabic readers; we seek to compare the effects of long-term phonological memory in comparison to short-term phonological exposure (as indicated by the presentation of familiar words from the exposure phase). The researcher’s hypothesis is that, despite the lack of lexical knowledge, early experience with converting written Quranic Arabic text into a phonological code will help participants recall the familiar Quranic words that appeared during the exposure phase more accurately than those that were not presented during the exposure phase. Moreover, it is anticipated that the non-native Arabic readers will also report more false alarms to the unfamiliar Quranic words, due to early childhood phonological exposure to Quranic Arabic script - thereby causing false phonological facilitatory effects.

Keywords: modern standard arabic, phonological facilitation, phonological memory, Quranic arabic, word recognition

Procedia PDF Downloads 218
2614 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 379
2613 Detecting Paraphrases in Arabic Text

Authors: Amal Alshahrani, Allan Ramsay

Abstract:

Paraphrasing is one of the important tasks in natural language processing; i.e. alternative ways to express the same concept by using different words or phrases. Paraphrases can be used in many natural language applications, such as Information Retrieval, Machine Translation, Question Answering, Text Summarization, or Information Extraction. To obtain pairs of sentences that are paraphrases we create a system that automatically extracts paraphrases from a corpus, which is built from different sources of news article since these are likely to contain paraphrases when they report the same event on the same day. There are existing simple standard approaches (e.g. TF-IDF vector space, cosine similarity) and alignment technique (e.g. Dynamic Time Warping (DTW)) for extracting paraphrase which have been applied to the English. However, the performance of these approaches could be affected when they are applied to another language, for instance Arabic language, due to the presence of phenomena which are not present in English, such as Free Word Order, Zero copula, and Pro-dropping. These phenomena will affect the performance of these algorithms. Thus, if we can analysis how the existing algorithms for English fail for Arabic then we can find a solution for Arabic. The results are promising.

Keywords: natural language processing, TF-IDF, cosine similarity, dynamic time warping (DTW)

Procedia PDF Downloads 255
2612 Spoken Rhetoric in Arabic Heritage

Authors: Ihab Al-Mokrani

Abstract:

The Arabic heritage has two types of spoken rhetoric: the first type which al-Jaahiz calls “the rhetoric of the sign,” which means body language, and the rhetoric of silence which is of no less importance than the rhetoric of the sign, the speaker’s appearance and movements, etc. The second type is the spoken performance of utterances which bears written rhetoric arts like metaphor, simile, metonymy, etc. Rationale of the study: First: in spite of the factual existence of rhetorical phenomena in the Arabic heritage, there has been no contemporary study handling the spoken rhetoric in the Arabic heritage. Second: Arabic Civilization is originally a spoken one. Comparing the Arabic culture and civilization, from one side, to the Greek, roman or Pharaonic cultures and civilizations, from the other side, shows that the latter cultures and civilizations started and flourished written while the former started among illiterate people who had no interest in writing until recently. That sort of difference on the part of the Arabic culture and civilization created a rhetoric different from rhetoric in the other cultures and civilizations. Third: the spoken nature of the Arabic civilization influenced the Arabic rhetoric in the sense that specific rhetorical arts have been introduced matching that spoken nature. One of these arts is the art of concision which compensates for the absence of writing’s means of preserving the text. In addition, this interprets why many of the definitions of the Arabic rhetoric were defining rhetoric as the art of concision. Also, this interprets the fact that the literary genres known in the Arabic culture were limited by the available narrow space like poetry, anecdotes, and stories, while the literary genres in the Greek culture were of wide space as epics and drama. This is not of any contrast to the fact that some Arabic poetry would exceed 100 lines of poetry as Arabic poetry was based on the line organic unity, which means that every line could stand alone with a full meaning that is not dependent on the rest of the poem; and that last aspect has never happened in any culture other than the Arabic culture.

Keywords: Arabic rhetoric, spoken rhetoric, Arabic heritage, culture

Procedia PDF Downloads 656
2611 Reading in Multiple Arabic's: Effects of Diglossia and Orthography

Authors: Aula Khatteb Abu-Liel

Abstract:

The study investigated the effects of diglossia and orthography on reading in Arabic, manipulating reading in Spoken Arabic (SA), using Arabizi, in which it is written using Latin letters on computers/phones, and the two forms of the conventional written form Modern Standard Arabic (MSA): vowelled (shallow) and unvowelled (deep). 77 skilled readers in 8th grade performed oral reading of single words and narrative and expository texts, and silent reading comprehension of both genres of text. Oral reading and comprehension revealed different patterns. Single words and texts were read faster and more accurately in unvoweled MSA, slowest and least accurately in vowelled MSA, and in-between in Arabizi. Comprehension was highest for vowelled MSA. Narrative texts were better than expository texts in Arabizi with the opposite pattern in MSA. The results suggest that frequency of the type of texts and the way in which phonology is encoded affect skilled reading.

Keywords: Arabic, Arabize, computer mediated communication, diglossia, modern standard Arabic

Procedia PDF Downloads 25
2610 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 31
2609 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 41
2608 University Arabic/Foreign Language Teacher's Competences, Professionalism and the Challenges and Opportunities

Authors: Abeer Heider

Abstract:

The article considers the definitions of teacher’s competences and professionalism from different perspectives of Arab and foreign scientists. A special attention is paid to the definition, classification of the stages and components of University Arabic /foreign language teacher’s professionalism. The results of the survey are offered and recommendations are given. In this paper, only some of the problems of defining professional competence and professionalism of the university Arabic/ foreign language teacher have been mentioned. It needs much more analysis and discussion, because the quality of training today’s competitive and mobile students with a good knowledge of foreign languages depends directly on the teachers’ professional level.

Keywords: teacher’s professional competences, Arabic/ foreign language teacher’s professionalism, teacher evaluation, teacher quality

Procedia PDF Downloads 306
2607 Validating the Arabic Communicative Development Inventory for Assessing the Development of Language in Arabic-Speaking Children

Authors: Alshaimaa Abdelwahab, Allegra Cattani, Caroline Floccia

Abstract:

Assessing children’s language is fundamental for changing their developmental outcome as it gives a chance for a quick and early intervention with the suitable planning and monitoring program. The importance of language assessment lies in helping to find the right test fit for purpose, in addition to achievement and proficiency. This study examines the validity of a new Arabic assessment tool, the Arabic Communicative Development Inventory ‘Arabic CDI’. It assesses the development of language in Arabic children in different Arabic countries, allowing to detect children with language delay. A concurrent validity is set to compare the Arabic CDI to the Arabic Language test. Twenty-three typically developing Egyptian healthy children and their mothers participated in this study. Their age is 24 months (+ or -) two weeks. The sample included 13 males and 10 females. Mothers completed the Arabic CDI either before or after the Arabic Language Test was conducted with the child. The score for comprehension in the Arabic CDI (M= 52.7, SD= 9.7) and words understood in the Arabic Language Test (M= 59.6, SD= 12.5) were strongly and positively correlated (r= .62, p= .002). At the same time, the scores for production in the Arabic CDI (M= 38.4, SD= 14.8) and words expressed in the Arabic Language Test (M= 52.1, SD= 16.3) were also strongly and positively correlated (r= .82, p= .000). The new Arabic CDI is an adequate tool for assessing the development of comprehension and production at Arabic children. In addition, it could be used for detecting children with language impairment. Standardization of the Arabic CDI across 18 different Arabic dialects in children aged 8 to 30 months is underway.

Keywords: Arabic CDI, assessing children, language development, language impairment

Procedia PDF Downloads 315
2606 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 408
2605 A Syntactic Approach to Applied and Socio-Linguistics in Arabic Language in Modern Communications

Authors: Adeyemo Abduljeeel Taiwo

Abstract:

This research is an attempt that creates a conducive atmosphere of a phonological and morphological compendium of Arabic language in Modern Standard Arabic (MSA) for modern day communications. The research is carried out with the chief aim of grammatical analysis of the two broad fields of Arabic linguistics namely: Applied and Socio-Linguistics. It draws a pictorial record of Applied and Socio-Linguistics in Arabic phonology and morphology. Thematically, it postulates and contemplates to a large degree, the theory of concord in contemporary modern Arabic language acquisition. It utilizes an analytical method while it portrays Arabic as a Semitic language that promotes linguistics and syntax among the scholars of the fields.

Keywords: Arabic language, applied linguistics, socio-linguistics, modern communications

Procedia PDF Downloads 192