Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5401

Search results for: learning text

5401 A Study of Various Ontology Learning Systems from Text and a Look into Future

Authors: Fatima Al-Aswadi, Chan Yong


With the large volume of unstructured data that increases day by day on the web, the motivation of representing the knowledge in this data in the machine processable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The goal of Ontology learning from text is to elicit and represent domain knowledge in the machine readable form. This paper aims to give a follow-up review on the ontology learning systems from text and some of their defects. Furthermore, it discusses how far the ontology learning process will enhance in the future.

Keywords: concept discovery, deep learning, ontology learning, semantic relation, semantic web

Procedia PDF Downloads 343
5400 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan


Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 61
5399 The Challenges of Hyper-Textual Learning Approach for Religious Education

Authors: Elham Shirvani–Ghadikolaei, Seyed Mahdi Sajjadi


State of the art technology has the tremendous impact on our life, in this situation education system have been influenced as well as. In this paper, tried to compare two space of learning text and hypertext with each other, and some challenges of using hypertext in religious education. Regarding the fact that, hypertext is an undeniable part of learning in this world and it has highly beneficial for the education process from class to office and home. In this paper tried to solve this question: the consequences and challenges of applying hypertext in religious education. Also, the consequences of this survey demonstrate the role of curriculum designer and planner of education to solve this problem.

Keywords: Hyper-textual, learning, religious education, learning text

Procedia PDF Downloads 175
5398 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini


Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 40
5397 Evaluating the Effectiveness of Animated Videos in Learning Economics

Authors: J. Chow


In laboratory settings, this study measured and reported the effects of undergraduate students watching animated videos on learning microeconomics as compared with the effectiveness of reading written texts. The study described an experiment on learning microeconomics in higher education using two different types of learning materials. It reported the effectiveness on microeconomics learning of watching animated videos and reading written texts. Undergraduate students in the university were randomly assigned to either a ‘video group’ or a ‘text group’ in the experiment. Previously-validated multiple-choice questions on fundamental concepts of microeconomics were administered. Both groups showed improvement between the pre-test and post-test. The experience of learning using text and video materials was also assessed. After controlling the student characteristics variables, the analyses showed that both types of materials showed comparable level of perceived learning experience. The effect size and statistical significance of these results supported the hypothesis that animated video is an effective alternative to text materials as a learning tool for students. The findings suggest that such animated videos may support teaching microeconomics in higher education.

Keywords: animated videos for education, laboratory experiment, microeconomics education, undergraduate economics education

Procedia PDF Downloads 57
5396 Extraction of Text Subtitles in Multimedia Systems

Authors: Amarjit Singh


In this paper, a method for extraction of text subtitles in large video is proposed. The video data needs to be annotated for many multimedia applications. Text is incorporated in digital video for the motive of providing useful information about that video. So need arises to detect text present in video to understanding and video indexing. This is achieved in two steps. First step is text localization and the second step is text verification. The method of text detection can be extended to text recognition which finds applications in automatic video indexing; video annotation and content based video retrieval. The method has been tested on various types of videos.

Keywords: video, subtitles, extraction, annotation, frames

Procedia PDF Downloads 487
5395 Urdu Text Extraction Method from Images

Authors: Samabia Tehsin, Sumaira Kausar


Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.

Keywords: caption text, content-based image retrieval, document analysis, text extraction

Procedia PDF Downloads 383
5394 The Difference of Learning Outcomes in Reading Comprehension between Text and Film as The Media in Indonesian Language for Foreign Speaker in Intermediate Level

Authors: Siti Ayu Ningsih


This study aims to find the differences outcomes in learning reading comprehension with text and film as media on Indonesian Language for foreign speaker (BIPA) learning at intermediate level. By using quantitative and qualitative research methods, the respondent of this study is a single respondent from D'Royal Morocco Integrative Islamic School in grade nine from secondary level. Quantitative method used to calculate the learning outcomes that have been given the appropriate action cycle, whereas qualitative method used to translate the findings derived from quantitative methods to be described. The technique used in this study is the observation techniques and testing work. Based on the research, it is known that the use of the text media is more effective than the film for intermediate level of Indonesian Language for foreign speaker learner. This is because, when using film the learner does not have enough time to take note the difficult vocabulary and don't have enough time to look for the meaning of the vocabulary from the dictionary. While the use of media texts shows the better effectiveness because it does not require additional time to take note the difficult words. For the words that are difficult or strange, the learner can immediately find its meaning from the dictionary. The presence of the text is also very helpful for Indonesian Language for foreign speaker learner to find the answers according to the questions more easily. By matching the vocabulary of the question into the text references.

Keywords: Indonesian language for foreign speaker, learning outcome, media, reading comprehension

Procedia PDF Downloads 111
5393 Increasing the Ability of State Senior High School 12 Pekanbaru Students in Writing an Analytical Exposition Text through Comic Strips

Authors: Budiman Budiman


This research aimed at describing and testing whether the students’ ability in writing analytical exposition text is increased by using comic strips at SMAN 12 Pekanbaru. The respondents of this study were the second-grade students, especially XI Science 3 academic year 2011-2012. The total number of students in this class was forty-two (42) students. The quantitative and qualitative data was collected by using writing test and observation sheets. The research finding reveals that there is a significant increase of students’ writing ability in writing analytical exposition text through comic strips. It can be proved by the average score of pre-test was 43.7 and the average score of post-test was 65.37. Besides, the students’ interest and motivation in learning are also improved. These can be seen from the increasing of students’ awareness and activeness in learning process based on observation sheets. The findings draw attention to the use of comic strips in teaching and learning is beneficial for better learning outcome.

Keywords: analytical exposition, comic strips, secondary school students, writing ability

Procedia PDF Downloads 73
5392 Spontaneous Message Detection of Annoying Situation in Community Networks Using Mining Algorithm

Authors: P. Senthil Kumari


Main concerns in data mining investigation are social controls of data mining for handling ambiguity, noise, or incompleteness on text data. We describe an innovative approach for unplanned text data detection of community networks achieved by classification mechanism. In a tangible domain claim with humble secrecy backgrounds provided by community network for evading annoying content is presented on consumer message partition. To avoid this, mining methodology provides the capability to unswervingly switch the messages and similarly recover the superiority of ordering. Here we designated learning-centered mining approaches with pre-processing technique to complete this effort. Our involvement of work compact with rule-based personalization for automatic text categorization which was appropriate in many dissimilar frameworks and offers tolerance value for permits the background of comments conferring to a variety of conditions associated with the policy or rule arrangements processed by learning algorithm. Remarkably, we find that the choice of classifier has predicted the class labels for control of the inadequate documents on community network with great value of effect.

Keywords: text mining, data classification, community network, learning algorithm

Procedia PDF Downloads 405
5391 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn


Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 165
5390 Using the Dokeos Platform for Industrial E-Learning Solution

Authors: Kherafa Abdennasser


The application of Information and Communication Technologies (ICT) to the training area led to the creation of this new reality called E-learning. That last one is described like the marriage of multi- media (sound, image and text) and of the internet (diffusion on line, interactivity). Distance learning became an important totality for training and that last pass in particular by the setup of a distance learning platform. In our memory, we will use an open source platform named Dokeos for the management of a distance training of GPS called e-GPS. The learner is followed in all his training. In this system, trainers and learners communicate individually or in group, the administrator setup and make sure of this system maintenance.

Keywords: ICT, E-learning, learning plate-forme, Dokeos, GPS

Procedia PDF Downloads 369
5389 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert


This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 52
5388 Developing a Model of Teaching Writing Based On Reading Approach through Reflection Strategy for EFL Students of STKIP YPUP

Authors: Eny Syatriana, Ardiansyah


The purpose of recent study was to develop a learning model on writing, based on the reading texts which will be read by the students using reflection strategy. The strategy would allow the students to read the text and then they would write back the main idea and to develop the text by using their own sentences. So, the writing practice was begun by reading an interesting text, then the students would develop the text which has been read into their writing. The problem questions are (1) what kind of learning model that can develop the students writing ability? (2) what is the achievement of the students of STKIP YPUP through reflection strategy? (3) is the using of the strategy effective to develop students competence In writing? (4) in what level are the students interest toward the using of a strategy In writing subject? This development research consisted of some steps, they are (1) need analysis (2) model design (3) implementation (4) model evaluation. The need analysis was applied through discussion among the writing lecturers to create a learning model for writing subject. To see the effectiveness of the model, an experiment would be delivered for one class. The instrument and learning material would be validated by the experts. In every steps of material development, there was a learning process, where would be validated by an expert. The research used development design. These Principles and procedures or research design and development .This study, researcher would do need analysis, creating prototype, content validation, and limited empiric experiment to the sample. In each steps, there should be an assessment and revision to the drafts before continue to the next steps. The second year, the prototype would be tested empirically to four classes in STKIP YPUP for English department. Implementing the test greatly was done through the action research and followed by evaluation and validation from the experts.

Keywords: learning model, reflection, strategy, reading, writing, development

Procedia PDF Downloads 278
5387 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney


Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 98
5386 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa


The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 59
5385 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba


Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 35
5384 A Rational Intelligent Agent to Promote Metacognition a Situation of Text Comprehension

Authors: Anass Hsissi, Hakim Allali, Abdelmajid Hajami


This article presents the results of a doctoral research which aims to integrate metacognitive dimension in the design of human learning computing environments (ILE). We conducted a detailed study on the relationship between metacognitive processes and learning, specifically their positive impact on the performance of learners in the area of reading comprehension. Our contribution is to implement methods, using an intelligent agent based on BDI paradigm to ensure intelligent and reliable support for low readers, in order to encourage regulation and a conscious and rational use of their metacognitive abilities.

Keywords: metacognition, text comprehension EIAH, autoregulation, BDI agent

Procedia PDF Downloads 228
5383 Reducing Accidents Using Text Stops

Authors: Benish Chaudhry


Most of the accidents these days are occurring because of the ‘text-and-drive’ concept. If we look at the structure of cities in UAE, there are great distances, because of which it is impossible to drive without using or merely checking the cellphone. Moreover, if we look at the road structure, it is almost impossible to stop at a point and text. With the introduction of TEXT STOPs, drivers will be able to stop different stops for a maximum of 1 and a half-minute in order to reply or write a message. They can be introduced at a distance of 10 minutes of driving on the average speed of the road, so the drivers can look forward to a stop and can reply to a text when needed. A user survey indicates that drivers are willing to NOT text-and-drive if they have such a facility available.

Keywords: transport, accidents, urban planning, road planning

Procedia PDF Downloads 248
5382 Study on the Focus of Attention of Special Education Students in Primary School

Authors: Tung-Kuang Wu, Hsing-Pei Hsieh, Ying-Ru Meng


Special Education in Taiwan has been facing difficulties including shortage of teachers and lack in resources. Some students need to receive special education are thus not identified or admitted. Fortunately, information technologies can be applied to relieve some of the difficulties. For example, on-line multimedia courseware can be used to assist the learning of special education students and take pretty much workload from special education teachers. However, there may exist cognitive variations between students in special or regular educations, which suggests the design of online courseware requires different considerations. This study aims to investigate the difference in focus of attention (FOA) between special and regular education students of primary school in viewing the computer screen. The study is essential as it helps courseware developers in determining where to put learning elements that matter the most on the right position of screen. It may also assist special education specialists to better understand the subtle differences among various subtypes of learning disabilities. This study involves 76 special education students (among them, 39 are students with mental retardation, MR, and 37 are students with learning disabilities, LDs) and 42 regular education students. The participants were asked to view a computer screen showing a picture partitioned into 3 × 3 areas with each area filled with text or icon. The subjects were then instructed to mark on the prior given paper sheets, which are also partitioned into 3 × 3 grids, the areas corresponding to the pictures on the computer screen that they first set their eyes on. The data are then collected and analyzed. Major findings are listed: 1. In both text and icon scenario, significant differences exist in the first preferred FOA between special and regular education students. The first FOA for the former is mainly on area 1 (upper left area, 53.8% / 51.3% for MR / LDs students in text scenario; and 53.8% / 56.8% for MR / LDs students in icons scenario), while the latter on area 5 (middle area, 50.0% and 57.1% in text and icons scenarios). 2. The second most preferred area in text scenario for students with MR and LDs are area 2 (upper-middle, 20.5%) and 5 (middle area, 24.3%). In icons scenario, the results are similar, but lesser in percentage. 3. Students with LDs that show similar preference (either in text or icons scenarios) in FOA to regular education students tend to be of some specific sub-type of learning disabilities. For instance, students with LDs that chose area 5 (middle area, either in text or icon scenario) as their FOA are mostly ones that have reading or writing disability. Also, three (out of 13) subjects in this category, after going through the rediagnosis process, were excluded from being learning disabilities. In summary, the findings suggest when designing multimedia courseware for students with MR and LDs, the essential learning elements should be placed on area 1, 2 and 5. In addition, FOV preference may also potentially be used as an indicator for diagnosing students with LDs.

Keywords: focus of attention, learning disabilities, mental retardation, on-line multimedia courseware, special education

Procedia PDF Downloads 71
5381 Structure Analysis of Text-Image Connection in Jalayrid Period Illustrated Manuscripts

Authors: Mahsa Khani Oushani


Text and image are two important elements in the field of Iranian art, the text component and the image component have always been manifested together. The image narrates the text and the text is the factor in the formation of the image and they are closely related to each other. The connection between text and image is an interactive and two-way connection in the tradition of Iranian manuscript arrangement. The interaction between the narrative description and the image scene is the result of a direct and close connection between the text and the image, which in addition to the decorative aspect, also has a descriptive aspect. In this article the connection between the text element and the image element and its adaptation to the theory of Roland Barthes, the structuralism theorist, in this regard will be discussed. This study tends to investigate the question of how the connection between text and image in illustrated manuscripts of the Jalayrid period is defined according to Barthes’ theory. And what kind of proportion has the artist created in the composition between text and image. Based on the results of reviewing the data of this study, it can be inferred that in the Jalayrid period, the image has a reference connection and although it is of major importance on the page, it also maintains a close connection with the text and is placed in a special proportion. It is not necessarily balanced and symmetrical and sometimes uses imbalance for composition. This research has been done by descriptive-analytical method, which has been done by library collection method.

Keywords: structure, text, image, Jalayrid, painter

Procedia PDF Downloads 85
5380 Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Authors: Faisal Alshuwaier, Ali Areshey


Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound method to simplify the texts.

Keywords: extraction, max-prod, fuzzy relations, text mining, memberships, classification, memberships, classification

Procedia PDF Downloads 457
5379 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar


The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 331
5378 Evolving Knowledge Extraction from Online Resources

Authors: Zhibo Xiao, Tharini Nayanika de Silva, Kezhi Mao


In this paper, we present an evolving knowledge extraction system named AKEOS (Automatic Knowledge Extraction from Online Sources). AKEOS consists of two modules, including a one-time learning module and an evolving learning module. The one-time learning module takes in user input query, and automatically harvests knowledge from online unstructured resources in an unsupervised way. The output of the one-time learning is a structured vector representing the harvested knowledge. The evolving learning module automatically schedules and performs repeated one-time learning to extract the newest information and track the development of an event. In addition, the evolving learning module summarizes the knowledge learned at different time points to produce a final knowledge vector about the event. With the evolving learning, we are able to visualize the key information of the event, discover the trends, and track the development of an event.

Keywords: evolving learning, knowledge extraction, knowledge graph, text mining

Procedia PDF Downloads 351
5377 Managing Cognitive Load in Accounting: An Analysis of Three Instructional Designs in Financial Accounting

Authors: Seedwell Sithole


One of the persistent problems in accounting education is how to effectively support students’ learning. A promising technique to this issue is to investigate the extent that learning is determined by the design of instructional material. This study examines the academic performance of students using three instructional designs in financial accounting. Student’s performance scores and reported mental effort ratings were used to determine the instructional effectiveness. The findings of this study show that accounting students prefer graph and text designs that are integrated. The results suggest that spatially separated graph and text presentations in accounting should be reorganized to align with the requirements of human cognitive architecture.

Keywords: accounting, cognitive load, education, instructional preferences, students

Procedia PDF Downloads 52
5376 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui


In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 297
5375 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni


The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 67
5374 Arabic Text Classification: Review Study

Authors: M. Hijazi, A. Zeki, A. Ismail


An enormous amount of valuable human knowledge is preserved in documents. The rapid growth in the number of machine-readable documents for public or private access requires the use of automatic text classification. Text classification can be defined as assigning or structuring documents into a defined set of classes known in advance. Arabic text classification methods have emerged as a natural result of the existence of a massive amount of varied textual information written in the Arabic language on the web. This paper presents a review on the published researches of Arabic Text Classification using classical data representation, Bag of words (BoW), and using conceptual data representation based on semantic resources such as Arabic WordNet and Wikipedia.

Keywords: Arabic text classification, Arabic WordNet, bag of words, conceptual representation, semantic relations

Procedia PDF Downloads 313
5373 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa


Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 334
5372 Applying Dictogloss Technique to Improve Auditory Learners’ Writing Skills in Second Language Learning

Authors: Aji Budi Rinekso


There are some common problems that are often faced by students in writing. The problems are related to macro and micro skills of writing, such as incorrect spellings, inappropriate diction, grammatical errors, random ideas, and irrelevant supporting sentences. Therefore, it is needed a teaching technique that can solve those problems. Dictogloss technique is a teaching technique that involves listening practices. So, it is a suitable teaching technique for students with auditory learning style. Dictogloss technique comprises of four basic steps; (1) warm up, (2) dictation, (3) reconstruction and (4) analysis and correction. Warm up is when students find out about topics and do some preparatory vocabulary works. Then, dictation is when the students listen to texts read at normal speed by a teacher. The text is read by the teacher twice where at the first reading the students only listen to the teacher and at the second reading the students listen to the teacher again and take notes. Next, reconstruction is when the students discuss the information from the text read by the teacher and start to write a text. Lastly, analysis and correction are when the students check their writings and revise them. Dictogloss offers some advantages in relation to the efforts of improving writing skills. Through the use of dictogloss technique, students can solve their problems both on macro skills and micro skills. Easier to generate ideas and better writing mechanics are the benefits of dictogloss.

Keywords: auditory learners, writing skills, dictogloss technique, second language learning

Procedia PDF Downloads 37