Search results for: Arabic text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3602

Search results for: Arabic text classification

3182 Feature Extraction and Classification Based on the Bayes Test for Minimum Error

Authors: Nasar Aldian Ambark Shashoa

Abstract:

Classification with a dimension reduction based on Bayesian approach is proposed in this paper . The first step is to generate a sample (parameter) of fault-free mode class and faulty mode class. The second, in order to obtain good classification performance, a selection of important features is done with the discrete karhunen-loeve expansion. Next, the Bayes test for minimum error is used to classify the classes. Finally, the results for simulated data demonstrate the capabilities of the proposed procedure.

Keywords: analytical redundancy, fault detection, feature extraction, Bayesian approach

Procedia PDF Downloads 503
3181 Network Traffic Classification Scheme for Internet Network Based on Application Categorization for Ipv6

Authors: Yaser Miaji, Mohammed Aloryani

Abstract:

The rise of recent applications in everyday implementation like videoconferencing, online recreation and voice speech communication leads to pressing the need for novel mechanism and policy to serve this steep improvement within the application itself and users‟ wants. This diversity in web traffics needs some classification and prioritization of the traffics since some traffics merit abundant attention with less delay and loss, than others. This research is intended to reinforce the mechanism by analysing the performance in application according to the proposed mechanism implemented. The mechanism used is quite direct and analytical. The mechanism is implemented by modifying the queue limit in the algorithm.

Keywords: traffic classification, IPv6, internet, application categorization

Procedia PDF Downloads 538
3180 Moral Wrongdoers: Evaluating the Value of Moral Actions Performed by War Criminals

Authors: Jean-Francois Caron

Abstract:

This text explores the value of moral acts performed by war criminals, and the extent to which they should alleviate the punishment these individuals ought to receive for violating the rules of war. Without neglecting the necessity of retribution in war crimes cases, it argues from an ethical perspective that we should not rule out the possibility of considering lesser punishments for war criminals who decide to perform a moral act, as it might produce significant positive moral outcomes. This text also analyzes how such a norm could be justified from a moral perspective.

Keywords: war criminals, pardon, amnesty, retribution

Procedia PDF Downloads 257
3179 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 156
3178 On the Weightlessness of Vowel Lengthening: Insights from Arabic Dialect of Yemen and Contribution to Psychoneurolinguistics

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Montaha Al Yaari, Ayman Al Yaari, Aayah Al Yaari, Adham Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

Introduction: It is well established that lengthening (longer duration) is considered one of the correlates of lexical and phrasal prominence. However, it is unexplored whether the scope of vowel lengthening in the Arabic dialect of Yemen (ADY) is differently affected by educated and/or uneducated speakers from different dialectal backgrounds. Specifically, the research aims to examine whether or not linguistic background acquired through different educational channels makes a difference in the speech of the speaker and how that is reflected in related psychoneurolinguistic impairments. Methods: For the above mentioned purpose, we conducted an articulatory experiment wherein a set of words from ADY were examined in the dialectal speech of thousand and seven hundred Yemeni educated and uneducated speakers aged 19-61 years growing up in five regions of the country: Northern, southern, eastern, western and central and were, accordingly, assigned into five dialectal groups. A seven-minute video clip was shown to the participants, who have been asked to spontaneously describe the scene they had just watched before the researchers linguistically and statistically analyzed recordings to weigh vowel lengthening in the speech of the participants. Results: The results show that vowels (monophthongs and diphthongs) are lengthened by all participants. Unexpectedly, educated and uneducated speakers from northern and central dialects lengthen vowels. Compared with uneducated speakers from the same dialect, educated speakers lengthen fewer vowels in their dialectal speech. Conclusions: These findings support the notion that extensive exposure to dialects on account of standard language can cause changes to the patterns of dialects themselves, and this can be seen in the speech of educated and uneducated speakers of these dialects. Further research is needed to clarify the phonemic distinctive features and frequency of lengthening in other open class systems (i.e., nouns, adjectives, and adverbs). Phonetic and phonological report measures are needed as well as validation of existing measures for assessing phonemic vowel length in the Arabic population in general and Arabic individuals with voice, speech, and language impairments in particular.

Keywords: vowel lengthening, Arabic dialect of Yemen, phonetics, phonology, impairment, distinctive features

Procedia PDF Downloads 14
3177 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 130
3176 Contextual Toxicity Detection with Data Augmentation

Authors: Julia Ive, Lucia Specia

Abstract:

Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.

Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing

Procedia PDF Downloads 142
3175 Comparison of the Classification of Cystic Renal Lesions Using the Bosniak Classification System with Contrast Enhanced Ultrasound and Magnetic Resonance Imaging to Computed Tomography: A Prospective Study

Authors: Dechen Tshering Vogel, Johannes T. Heverhagen, Bernard Kiss, Spyridon Arampatzis

Abstract:

In addition to computed tomography (CT), contrast enhanced ultrasound (CEUS), and magnetic resonance imaging (MRI) are being increasingly used for imaging of renal lesions. The aim of this prospective study was to compare the classification of complex cystic renal lesions using the Bosniak classification with CEUS and MRI to CT. Forty-eight patients with 65 cystic renal lesions were included in this study. All participants signed written informed consent. The agreement between the Bosniak classifications of complex renal lesions ( ≥ BII-F) on CEUS and MRI were compared to that of CT and were tested using Cohen’s Kappa. Sensitivity, specificity, positive and negative predictive values (PPV/NPV) and the accuracy of CEUS and MRI compared to CT in the detection of complex renal lesions were calculated. Twenty-nine (45%) out of 65 cystic renal lesions were classified as complex using CT. The agreement between CEUS and CT in the classification of complex cysts was fair (agreement 50.8%, Kappa 0.31), and was excellent between MRI and CT (agreement 93.9%, Kappa 0.88). Compared to CT, MRI had a sensitivity of 96.6%, specificity of 91.7%, a PPV of 54.7%, and an NPV of 54.7% with an accuracy of 63.1%. The corresponding values for CEUS were sensitivity 100.0%, specificity 33.3%, PPV 90.3%, and NPV 97.1% with an accuracy 93.8%. The classification of complex renal cysts based on MRI and CT scans correlated well, and MRI can be used instead of CT for this purpose. CEUS can exclude complex lesions, but due to higher sensitivity, cystic lesions tend to be upgraded. However, it is useful for initial imaging, for follow up of lesions and in those patients with contraindications to CT and MRI.

Keywords: Bosniak classification, computed tomography, contrast enhanced ultrasound, cystic renal lesions, magnetic resonance imaging

Procedia PDF Downloads 118
3174 Enhancement Method of Network Traffic Anomaly Detection Model Based on Adversarial Training With Category Tags

Authors: Zhang Shuqi, Liu Dan

Abstract:

For the problems in intelligent network anomaly traffic detection models, such as low detection accuracy caused by the lack of training samples, poor effect with small sample attack detection, a classification model enhancement method, F-ACGAN(Flow Auxiliary Classifier Generative Adversarial Network) which introduces generative adversarial network and adversarial training, is proposed to solve these problems. Generating adversarial data with category labels could enhance the training effect and improve classification accuracy and model robustness. FACGAN consists of three steps: feature preprocess, which includes data type conversion, dimensionality reduction and normalization, etc.; A generative adversarial network model with feature learning ability is designed, and the sample generation effect of the model is improved through adversarial iterations between generator and discriminator. The adversarial disturbance factor of the gradient direction of the classification model is added to improve the diversity and antagonism of generated data and to promote the model to learn from adversarial classification features. The experiment of constructing a classification model with the UNSW-NB15 dataset shows that with the enhancement of FACGAN on the basic model, the classification accuracy has improved by 8.09%, and the score of F1 has improved by 6.94%.

Keywords: data imbalance, GAN, ACGAN, anomaly detection, adversarial training, data augmentation

Procedia PDF Downloads 80
3173 International Classification of Primary Care as a Reference for Coding the Demand for Care in Primary Health Care

Authors: Souhir Chelly, Chahida Harizi, Aicha Hechaichi, Sihem Aissaoui, Leila Ben Ayed, Maha Bergaoui, Mohamed Kouni Chahed

Abstract:

Introduction: The International Classification of Primary Care (ICPC) is part of the morbidity classification system. It had 17 chapters, and each is coded by an alphanumeric code: the letter corresponds to the chapter, the number to a paragraph in the chapter. The objective of this study is to show the utility of this classification in the coding of the reasons for demand for care in Primary health care (PHC), its advantages and limits. Methods: This is a cross-sectional descriptive study conducted in 4 PHC in Ariana district. Data on the demand for care during 2 days in the same week were collected. The coding of the information was done according to the CISP. The data was entered and analyzed by the EPI Info 7 software. Results: A total of 523 demands for care were investigated. The patients who came for the consultation are predominantly female (62.72%). Most of the consultants are young with an average age of 35 ± 26 years. In the ICPC, there are 7 rubrics: 'infections' is the most common reason with 49.9%, 'other diagnoses' with 40.2%, 'symptoms and complaints' with 5.5%, 'trauma' with 2.1%, 'procedures' with 2.1% and 'neoplasm' with 0.3%. The main advantage of the ICPC is the fact of being a standardized tool. It is very suitable for classification of the reasons for demand for care in PHC according to their specificity, capacity to be used in a computerized medical file of the PHC. Its current limitations are related to the difficulty of classification of some reasons for demand for care. Conclusion: The ICPC has been developed to provide healthcare with a coding reference that takes into account their specificity. The CIM is in its 10th revision; it would gain from revision to revision to be more efficient to be generalized and used by the teams of PHC.

Keywords: international classification of primary care, medical file, primary health care, Tunisia

Procedia PDF Downloads 241
3172 Walking in the Steps of Poets: Evoking Past Poets in Sufi Poetry

Authors: Bilal Orfali

Abstract:

It is common practice in modern times to read mystical poetry and apply it to our mundane lives and loves. Sufis in the early period did the opposite. Their mystical hymns often spun out of the courtly poetic ghazal, panegyric, and wine songs. This paper highlights the relation of the Arabic courtly poetic canon to early Sufism. Sufi akhbār and poetry evoke past poets and their poetic heritage. They tend to quote or refer to eminent poets whose poetry must have been widely circulated and memorized. However, Sufism places this readily recognizable poetry in a new context that deliberately changes the past. It is a process of a metaphorization in which the reality of the pre-Islamic, Umayyad, and Abbasid models now acts as a device or metaphor for the Sufi poetics.

Keywords: Sufism, Arabic poetry, literature, Islamic literature, Abbasid

Procedia PDF Downloads 289
3171 Classification of Health Information Needs of Hypertensive Patients in the Online Health Community Based on Content Analysis

Authors: Aijing Luo, Zirui Xin, Yifeng Yuan

Abstract:

Background: With the rapid development of the online health community, more and more patients or families are seeking health information on the Internet. Objective: This study aimed to discuss how to fully reveal the health information needs expressed by hypertensive patients in their questions in the online environment. Methods: This study randomly selected 1,000 text records from the question data of hypertensive patients from 2008 to 2018 collected from the website www.haodf.com and constructed a classification system through literature research and content analysis. This paper identified the background characteristics and questioning the intention of each hypertensive patient based on the patient’s question and used co-occurrence network analysis to explore the features of the health information needs of hypertensive patients. Results: The classification system for health information needs of patients with hypertension is composed of 9 parts: 355 kinds of drugs, 395 kinds of symptoms and signs, 545 kinds of tests and examinations , 526 kinds of demographic data, 80 kinds of diseases, 37 kinds of risk factors, 43 kinds of emotions, 6 kinds of lifestyles, 49 kinds of questions. The characteristics of the explored online health information needs of the hypertensive patients include: i)more than 49% of patients describe the features such as drugs, symptoms and signs, tests and examinations, demographic data, diseases, etc. ii) these groups are most concerned about treatment (77.8%), followed by diagnosis (32.3%); iii) 65.8% of hypertensive patients will ask doctors online several questions at the same time. 28.3% of the patients are very concerned about how to adjust the medication, and they will ask other treatment-related questions at the same time, including drug side effects, whether to take drugs, how to treat a disease, etc.; secondly, 17.6% of the patients will consult the doctors online about the causes of the clinical findings, including the relationship between the clinical findings and a disease, the treatment of a disease, medication, and examinations. Conclusion: In the online environment, the health information needs expressed by Chinese hypertensive patients to doctors are personalized; that is, patients with different background features express their questioning intentions to doctors. The classification system constructed in this study can guide health information service providers in the construction of online health resources, to help solve the problem of information asymmetry in communication between doctors and patients.

Keywords: online health community, health information needs, hypertensive patients, doctor-patient communication

Procedia PDF Downloads 97
3170 Evaluation and Fault Classification for Healthcare Robot during Sit-To-Stand Performance through Center of Pressure

Authors: Tianyi Wang, Hieyong Jeong, An Guo, Yuko Ohno

Abstract:

Healthcare robot for assisting sit-to-stand (STS) performance had aroused numerous research interests. To author’s best knowledge, knowledge about how evaluating healthcare robot is still unknown. Robot should be labeled as fault if users feel demanding during STS when they are assisted by robot. In this research, we aim to propose a method to evaluate sit-to-stand assist robot through center of pressure (CoP), then classify different STS performance. Experiments were executed five times with ten healthy subjects under four conditions: two self-performed STSs with chair heights of 62 cm and 43 cm, and two robot-assisted STSs with chair heights of 43 cm and robot end-effect speed of 2 s and 5 s. CoP was measured using a Wii Balance Board (WBB). Bayesian classification was utilized to classify STS performance. The results showed that faults occurred when decreased the chair height and slowed robot assist speed. Proposed method for fault classification showed high probability of classifying fault classes form others. It was concluded that faults for STS assist robot could be detected by inspecting center of pressure and be classified through proposed classification algorithm.

Keywords: center of pressure, fault classification, healthcare robot, sit-to-stand movement

Procedia PDF Downloads 172
3169 Semantic Differences between Bug Labeling of Different Repositories via Machine Learning

Authors: Pooja Khanal, Huaming Zhang

Abstract:

Labeling of issues/bugs, also known as bug classification, plays a vital role in software engineering. Some known labels/classes of bugs are 'User Interface', 'Security', and 'API'. Most of the time, when a reporter reports a bug, they try to assign some predefined label to it. Those issues are reported for a project, and each project is a repository in GitHub/GitLab, which contains multiple issues. There are many software project repositories -ranging from individual projects to commercial projects. The labels assigned for different repositories may be dependent on various factors like human instinct, generalization of labels, label assignment policy followed by the reporter, etc. While the reporter of the issue may instinctively give that issue a label, another person reporting the same issue may label it differently. This way, it is not known mathematically if a label in one repository is similar or different to the label in another repository. Hence, the primary goal of this research is to find the semantic differences between bug labeling of different repositories via machine learning. Independent optimal classifiers for individual repositories are built first using the text features from the reported issues. The optimal classifiers may include a combination of multiple classifiers stacked together. Then, those classifiers are used to cross-test other repositories which leads the result to be deduced mathematically. The produce of this ongoing research includes a formalized open-source GitHub issues database that is used to deduce the similarity of the labels pertaining to the different repositories.

Keywords: bug classification, bug labels, GitHub issues, semantic differences

Procedia PDF Downloads 174
3168 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images

Authors: Jameela Ali Alkrimi, Abdul Rahim Ahmad, Azizah Suliman, Loay E. George

Abstract:

Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. The lack of RBCs is a condition characterized by lower than normal hemoglobin level; this condition is referred to as 'anemia'. In this study, a software was developed to isolate RBCs by using a machine learning approach to classify anemic RBCs in microscopic images. Several features of RBCs were extracted using image processing algorithms, including principal component analysis (PCA). With the proposed method, RBCs were isolated in 34 second from an image containing 18 to 27 cells. We also proposed that PCA could be performed to increase the speed and efficiency of classification. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network ANN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained for a short time period with more efficient when PCA was used.

Keywords: red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC

Procedia PDF Downloads 382
3167 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques

Authors: Mei-Yi Wu, Shang-Ming Huang

Abstract:

The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.

Keywords: mobile image retrieval, text mining, product information service system, online marketing

Procedia PDF Downloads 334
3166 An Attempt at the Multi-Criterion Classification of Small Towns

Authors: Jerzy Banski

Abstract:

The basic aim of this study is to discuss and assess different classifications and research approaches to small towns that take their social and economic functions into account, as well as relations with surrounding areas. The subject literature typically includes three types of approaches to the classification of small towns: 1) the structural, 2) the location-related, and 3) the mixed. The structural approach allows for the grouping of towns from the point of view of the social, cultural and economic functions they discharge. The location-related approach draws on the idea of there being a continuum between the center and the periphery. A mixed classification making simultaneous use of the different approaches to research brings the most information to bear in regard to categories of the urban locality. Bearing in mind the approaches to classification, it is possible to propose a synthetic method for classifying small towns that takes account of economic structure, location and the relationship between the towns and their surroundings. In the case of economic structure, the small centers may be divided into two basic groups – those featuring a multi-branch structure and those that are specialized economically. A second element of the classification reflects the locations of urban centers. Two basic types can be identified – the small town within the range of impact of a large agglomeration, or else the town outside such areas, which is to say located peripherally. The third component of the classification arises out of small towns’ relations with their surroundings. In consequence, it is possible to indicate 8 types of small-town: from local centers enjoying good accessibility and a multi-branch economic structure to peripheral supra-local centers characterised by a specialized economic structure.

Keywords: small towns, classification, functional structure, localization

Procedia PDF Downloads 164
3165 The Application of a Hybrid Neural Network for Recognition of a Handwritten Kazakh Text

Authors: Almagul Assainova , Dariya Abykenova, Liudmila Goncharenko, Sergey Sybachin, Saule Rakhimova, Abay Aman

Abstract:

The recognition of a handwritten Kazakh text is a relevant objective today for the digitization of materials. The study presents a model of a hybrid neural network for handwriting recognition, which includes a convolutional neural network and a multi-layer perceptron. Each network includes 1024 input neurons and 42 output neurons. The model is implemented in the program, written in the Python programming language using the EMNIST database, NumPy, Keras, and Tensorflow modules. The neural network training of such specific letters of the Kazakh alphabet as ә, ғ, қ, ң, ө, ұ, ү, h, і was conducted. The neural network model and the program created on its basis can be used in electronic document management systems to digitize the Kazakh text.

Keywords: handwriting recognition system, image recognition, Kazakh font, machine learning, neural networks

Procedia PDF Downloads 232
3164 The Effects of Three Pre-Reading Activities (Text Summary, Vocabulary Definition, and Pre-Passage Questions) on the Reading Comprehension of Iranian EFL Learners

Authors: Leila Anjomshoa, Firooz Sadighi

Abstract:

This study investigated the effects of three types of pre-reading activities (vocabulary definitions, text summary and pre-passage questions) on EFL learners’ English reading comprehension. On the basis of the results of a placement test administered to two hundred and thirty English students at Kerman Azad University, 200 subjects (one hundred intermediate and one hundred advanced) were selected.Four texts, two of them at intermediate level and two of them at advanced level were chosen. The data gathered was subjected to the statistical procedures of ANOVA. A close examination of the results through Tukey’s HSD showed the fact that the experimental groups performed better than the control group, highlighting the effect of the treatment on them. Also, the experimental group C (text summary), performed remarkably better than the other three groups (both experimental & control). Group B subjects, vocabulary definitions, performed better than groups A and D. The pre-passage questions group’s (D) performance showed higher scores than the control condition.

Keywords: pre-reading activities, text summary, vocabulary definition, and pre-passage questions, reading comprehension

Procedia PDF Downloads 322
3163 The Audio-Visual and Syntactic Priming Effect on Specific Language Impairment and Gender in Modern Standard Arabic

Authors: Mohammad Al-Dawoody

Abstract:

This study aims at exploring if priming is affected by gender in Modern Standard Arabic and if it is restricted solely to subjects with no specific language impairment (SLI). The sample in this study consists of 74 subjects, between the ages of 11;1 and 11;10, distributed into (a) 2 SLI experimental groups of 38 subjects divided into two gender groups of 18 females and 20 males and (b) 2 non-SLI control groups of 36 subjects divided into two gender groups of 17 females and 19 males. Employing a mixed research design, the researcher conducted this study within the framework of the relevance theory (RT) whose main assumption is that human beings are endowed with a biological ability to magnify the relevance of the incoming stimuli. Each of the four groups was given two different priming stimuli: audio-visual priming (T1) and syntactic priming (T2). The results showed that the priming effect was sheer distinct among SLI participants especially when retrieving typical responses (TR) in T1 and T2 with slight superiority of males over females. The results also revealed that non-SLI females showed stronger original response (OR) priming in T1 than males and that non-SLI males in T2 excelled in OR priming than females. Furthermore, the results suggested that the audio-visual priming has a stronger effect on SLI females than non-SLI females and that syntactic priming seems to have the same effect on the two groups (non-SLI and SLI females). The conclusion is that the priming effect varies according to gender and is not confined merely to non-SLI subjects.

Keywords: specific language impairment, relevance theory, audio-visual priming, syntactic priming, modern standard Arabic

Procedia PDF Downloads 148
3162 Object Recognition Approach Based on Generalized Hough Transform and Color Distribution Serving in Generating Arabic Sentences

Authors: Nada Farhani, Naim Terbeh, Mounir Zrigui

Abstract:

The recognition of the objects contained in images has always presented a challenge in the field of research because of several difficulties that the researcher can envisage because of the variability of shape, position, contrast of objects, etc. In this paper, we will be interested in the recognition of objects. The classical Hough Transform (HT) presented a tool for detecting straight line segments in images. The technique of HT has been generalized (GHT) for the detection of arbitrary forms. With GHT, the forms sought are not necessarily defined analytically but rather by a particular silhouette. For more precision, we proposed to combine the results from the GHT with the results from a calculation of similarity between the histograms and the spatiograms of the images. The main purpose of our work is to use the concepts from recognition to generate sentences in Arabic that summarize the content of the image.

Keywords: recognition of shape, generalized hough transformation, histogram, spatiogram, learning

Procedia PDF Downloads 130
3161 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: client classification, loan suitability, risk rating, CART analysis

Procedia PDF Downloads 319
3160 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 342
3159 Entropy in a Field of Emergence in an Aspect of Linguo-Culture

Authors: Nurvadi Albekov

Abstract:

Communicative situation is a basis, which designates potential models of ‘constructed forms’, a motivated basis of a text, for a text can be assumed as a product of the communicative situation. It is within the field of emergence the models of text, that can be potentially prognosticated in a certain communicative situation, are designated. Every text can be assumed as conceptual system structured on the base of certain communicative situation. However in the process of ‘structuring’ of a certain model of ‘conceptual system’ consciousness of a recipient is able act only within the border of the field of emergence for going out of this border indicates misunderstanding of the communicative situation. On the base of communicative situation we can witness the increment of meaning where the synergizing of the informative model of communication, formed by using of the invariant units of a language system, is a result of verbalization of the communicative situation. The potential of the models of a text, prognosticated within the field of emergence, also depends on the communicative situation. The conception ‘the field of emergence’ is interpreted as a unit of the language system, having poly-directed universal structure, implying the presence of the core, the center and the periphery, including different levels of means of a functioning system of language, both in terms of linguistic resources, and in terms of extra linguistic factors interaction of which results increment of a text. The conception ‘field of emergence’ is considered as the most promising in the analysis of texts: oral, written, printed and electronic. As a unit of the language system field of emergence has several properties that predict its use during the study of a text in different levels. This work is an attempt analysis of entropy in a text in the aspect of lingua-cultural code, prognosticated within the model of the field of emergence. The article describes the problem of entropy in the field of emergence, caused by influence of the extra-linguistic factors. The increasing of entropy is caused not only by the fact of intrusion of the language resources but by influence of the alien culture in a whole, and by appearance of non-typical for this very culture symbols in the field of emergence. The borrowing of alien lingua-cultural symbols into the lingua-culture of the author is a reason of increasing the entropy when constructing a text both in meaning and in structuring level. It is nothing but artificial formatting of lexical units that violate stylistic unity of a phrase. It is marked that one of the important characteristics descending the entropy in the field of emergence is a typical similarity of lexical and semantic resources of the different lingua-cultures in aspects of extra linguistic factors.

Keywords: communicative situation, field of emergence, lingua-culture, entropy

Procedia PDF Downloads 338
3158 Understanding the Qualitative Nature of Product Reviews by Integrating Text Processing Algorithm and Usability Feature Extraction

Authors: Cherry Yieng Siang Ling, Joong Hee Lee, Myung Hwan Yun

Abstract:

The quality of a product to be usable has become the basic requirement in consumer’s perspective while failing the requirement ends up the customer from not using the product. Identifying usability issues from analyzing quantitative and qualitative data collected from usability testing and evaluation activities aids in the process of product design, yet the lack of studies and researches regarding analysis methodologies in qualitative text data of usability field inhibits the potential of these data for more useful applications. While the possibility of analyzing qualitative text data found with the rapid development of data analysis studies such as natural language processing field in understanding human language in computer, and machine learning field in providing predictive model and clustering tool. Therefore, this research aims to study the application capability of text processing algorithm in analysis of qualitative text data collected from usability activities. This research utilized datasets collected from LG neckband headset usability experiment in which the datasets consist of headset survey text data, subject’s data and product physical data. In the analysis procedure, which integrated with the text-processing algorithm, the process includes training of comments onto vector space, labeling them with the subject and product physical feature data, and clustering to validate the result of comment vector clustering. The result shows 'volume and music control button' as the usability feature that matches best with the cluster of comment vectors where centroid comments of a cluster emphasized more on button positions, while centroid comments of the other cluster emphasized more on button interface issues. When volume and music control buttons are designed separately, the participant experienced less confusion, and thus, the comments mentioned only about the buttons' positions. While in the situation where the volume and music control buttons are designed as a single button, the participants experienced interface issues regarding the buttons such as operating methods of functions and confusion of functions' buttons. The relevance of the cluster centroid comments with the extracted feature explained the capability of text processing algorithms in analyzing qualitative text data from usability testing and evaluations.

Keywords: usability, qualitative data, text-processing algorithm, natural language processing

Procedia PDF Downloads 258
3157 Avoidance and Selectivity in the Acquisition of Arabic as a Second/Foreign Language

Authors: Abeer Heider

Abstract:

This paper explores and classifies the different kinds of avoidances that students commonly make in the acquisition of Arabic as a second/foreign language, and suggests specific strategies to help students lessen their avoidance trends in hopes of streamlining the learning process. Students most commonly use avoidance strategies in grammar, and word choice. These different types of strategies have different implications and naturally require different approaches. Thus the question remains as to the most effective way to help students improve their Arabic, and how teachers can efficiently utilize these techniques. It is hoped that this research will contribute to understand the role of avoidance in the field of the second language acquisition in general, and as a type of input. Yet some researchers also note that similarity between L1 and L2 may be problematic as well since the learner may doubt that such similarity indeed exists and consequently avoid the identical constructions or elements (Jordens, 1977; Kellermann, 1977, 1978, 1986). In an effort to resolve this issue, a case study is being conducted. The present case study attempts to provide a broader analysis of what is acquired than is usually the case, analyzing the learners ‘accomplishments in terms of three –part framework of the components of communicative competence suggested by Michele Canale: grammatical competence, sociolinguistic competence and discourse competence. The subjects of this study are 15 students’ 22th year who came to study Arabic at Qatar University of Cairo. The 15 students are in the advanced level. They were complete intermediate level in Arabic when they arrive in Qatar for the first time. The study used discourse analytic method to examine how the first language affects students’ production and output in the second language, and how and when students use avoidance methods in their learning. The study will be conducted through Fall 2015 through analyzing audio recordings that are recorded throughout the entire semester. The recordings will be around 30 clips. The students are using supplementary listening and speaking materials. The group will be tested at the end of the term to assess any measurable difference between the techniques. Questionnaires will be administered to teachers and students before and after the semester to assess any change in attitude toward avoidance and selectivity methods. Responses to these questionnaires are analyzed and discussed to assess the relative merits of the aforementioned strategies to avoidance and selectivity to further support on. Implications and recommendations for teacher training are proposed.

Keywords: the second language acquisition, learning languages, selectivity, avoidance

Procedia PDF Downloads 261
3156 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 476
3155 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning

Procedia PDF Downloads 443
3154 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 247
3153 The Difference of Learning Outcomes in Reading Comprehension between Text and Film as The Media in Indonesian Language for Foreign Speaker in Intermediate Level

Authors: Siti Ayu Ningsih

Abstract:

This study aims to find the differences outcomes in learning reading comprehension with text and film as media on Indonesian Language for foreign speaker (BIPA) learning at intermediate level. By using quantitative and qualitative research methods, the respondent of this study is a single respondent from D'Royal Morocco Integrative Islamic School in grade nine from secondary level. Quantitative method used to calculate the learning outcomes that have been given the appropriate action cycle, whereas qualitative method used to translate the findings derived from quantitative methods to be described. The technique used in this study is the observation techniques and testing work. Based on the research, it is known that the use of the text media is more effective than the film for intermediate level of Indonesian Language for foreign speaker learner. This is because, when using film the learner does not have enough time to take note the difficult vocabulary and don't have enough time to look for the meaning of the vocabulary from the dictionary. While the use of media texts shows the better effectiveness because it does not require additional time to take note the difficult words. For the words that are difficult or strange, the learner can immediately find its meaning from the dictionary. The presence of the text is also very helpful for Indonesian Language for foreign speaker learner to find the answers according to the questions more easily. By matching the vocabulary of the question into the text references.

Keywords: Indonesian language for foreign speaker, learning outcome, media, reading comprehension

Procedia PDF Downloads 179