Search results for: Arabic text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3590

Search results for: Arabic text classification

3200 A Tool to Measure the Usability Guidelines for Arab E-Government Websites

Authors: Omyma Alosaimi, Asma Alsumait

Abstract:

The website developer and designer should follow usability guidelines to provide a user-friendly interface. Using tools to measure usability, the evaluator can evaluate automatically hundreds of links within few minutes. It has the advantage of detecting some violations that only machines can detect. For that using usability evaluating tool is important to find as many violations as possible. There are many websites usability testing tools, but none is developed to measure the usability of e-government website nor Arabic e-government websites. To measure the usability of the Arabic e-government websites, a tool is developed and tested in this paper. A comparison of using a tool specifically developed for e-government websites and general usability testing tool is presented.

Keywords: e-government, human computer interaction, usability evaluation, usability guidelines

Procedia PDF Downloads 396
3199 Literary Theatre and Embodied Theatre: A Practice-Based Research in Exploring the Authorship of a Performance

Authors: Rahul Bishnoi

Abstract:

Theatre, as Ann Ubersfld calls it, is a paradox. At once, it is both a literary work and a physical representation. Theatre as a text is eternal, reproducible, and identical while as a performance, theatre is momentary and never identical to the previous performances. In this dual existence of theatre, who is the author? Is the author the playwright who writes the dramatic text, or the director who orchestrates the performance, or the actor who embodies the text? From the poststructuralist lens of Barthes, the author is dead. Barthes’ argument of discrete temporality, i.e. the author is the before, and the text is the after, does not hold true for theatre. A published literary work is written, edited, printed, distributed and then gets consumed by the reader. On the other hand, theatrical production is immediate; an actor performs and the audience witnesses it instantaneously. Time, so to speak, does not separate the author, the text, and the reader anymore. The question of authorship gets further complicated in Augusto Boal’s “Theatre of the Oppressed” movement where the audience is a direct participant like the actors in the performance. In this research, through an experimental performance, the duality of theatre is explored with the authorship discourse. And the conventional definition of authorship is subjected to additional complexity by erasing the distinction between an actor and the audience. The design/methodology of the experimental performance is as follows: The audience will be asked to produce a text under an anonymous virtual alias. The text, as it is being produced, will be read and performed by the actor. The audience who are also collectively “authoring” the text, will watch this performance and write further until everyone has contributed with one input each. The cycle of writing, reading, performing, witnessing, and writing will continue until the end. The intention is to create a dynamic system of writing/reading with the embodiment of the text through the actor. The actor is giving up the power to the audience to write the spoken word, stage instruction and direction while still keeping the agency of interpreting that input and performing in the chosen manner. This rapid conversation between the actor and the audience also creates a conversion of authorship. The main conclusion of this study is a perspective on the nature of dynamic authorship of theatre containing a critical enquiry of the collaboratively produced text, an individually performed act, and a collectively witnessed event. Using practice as a methodology, this paper contests the poststructuralist notion of the author as merely a ‘scriptor’ and breaks it further by involving the audience in the authorship as well.

Keywords: practice based research, performance studies, post-humanism, Avant-garde art, theatre

Procedia PDF Downloads 75
3198 Automatic Classification Using Dynamic Fuzzy C Means Algorithm and Mathematical Morphology: Application in 3D MRI Image

Authors: Abdelkhalek Bakkari

Abstract:

Image segmentation is a critical step in image processing and pattern recognition. In this paper, we proposed a new robust automatic image classification based on a dynamic fuzzy c-means algorithm and mathematical morphology. The proposed segmentation algorithm (DFCM_MM) has been applied to MR perfusion images. The obtained results show the validity and robustness of the proposed approach.

Keywords: segmentation, classification, dynamic, fuzzy c-means, MR image

Procedia PDF Downloads 446
3197 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 143
3196 Classification of Construction Projects

Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside

Abstract:

To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.

Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment

Procedia PDF Downloads 653
3195 Text Mining Techniques for Prioritizing Pathogenic Mutations in Protein Families Known to Misfold or Aggregate

Authors: Khaleel Saleh Al-Rababah

Abstract:

Amyloid fibril forming regions, which are known as protein aggregates, in sequences of some protein families are associated with a number of diseases known as amyloidosis. Mutations play a role in forming fibrils by accelerating the fibril formation process. In this paper we want to extract diseases that caused by those mutations as a result of the impact of the mutations on structural and functional properties of the aggregated protein. We propose a text mining system, to automatically extract mutations, diseases and relations between mutations and diseases. We presented an algorithm based on finite state to cluster mutations found in the same sentence as a sentence could contain different mutation cause different diseases. Also, we presented a co reference algorithm that enables cross-link sentences.

Keywords: amyloid, amyloidosis, co reference, protein, text mining

Procedia PDF Downloads 500
3194 The Application of Lesson Study Model in Writing Review Text in Junior High School

Authors: Sulastriningsih Djumingin

Abstract:

This study has some objectives. It aims at describing the ability of the second-grade students to write review text without applying the Lesson Study model at SMPN 18 Makassar. Second, it seeks to describe the ability of the second-grade students to write review text by applying the Lesson Study model at SMPN 18 Makassar. Third, it aims at testing the effectiveness of the Lesson Study model in writing review text at SMPN 18 Makassar. This research was true experimental design with posttest Only group design involving two groups consisting of one class of the control group and one class of the experimental group. The research populations were all the second-grade students at SMPN 18 Makassar amounted to 250 students consisting of 8 classes. The sampling technique was purposive sampling technique. The control class was VIII2 consisting of 30 students, while the experimental class was VIII8 consisting of 30 students. The research instruments were in the form of observation and tests. The collected data were analyzed using descriptive statistical techniques and inferential statistical techniques with t-test types processed using SPSS 21 for windows. The results shows that: (1) of 30 students in control class, there are only 14 (47%) students who get the score more than 7.5, categorized as inadequate; (2) in the experimental class, there are 26 (87%) students who obtain the score of 7.5, categorized as adequate; (3) the Lesson Study models is effective to be applied in writing review text. Based on the comparison of the ability of the control class and experimental class, it indicates that the value of t-count is greater than the value of t-table (2.411> 1.667). It means that the alternative hypothesis (H1) proposed by the researcher is accepted.

Keywords: application, lesson study, review text, writing

Procedia PDF Downloads 179
3193 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui

Abstract:

Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 477
3192 Translation, Cross-Cultural Adaption, and Validation of the Vividness of Movement Imagery Questionnaire 2 (VMIQ-2) to Classical Arabic Language

Authors: Majid Alenezi, Abdelbare Algamode, Amy Hayes, Gavin Lawrence, Nichola Callow

Abstract:

The purpose of this study was to translate and culturally adapt the Vividness of Movement Imagery Questionnaire-2 (VMIQ-2) from English to produce a new Arabic version (VMIQ-2A), and to evaluate the reliability and validity of the translated questionnaire. The questionnaire assesses how vividly and clearly individuals are able to imagine themselves performing everyday actions. Its purpose is to measure individuals’ ability to conduct movement imagery, which can be defined as “the cognitive rehearsal of a task in the absence of overt physical movement.” Movement imagery has been introduced in physiotherapy as a promising intervention technique, especially when physical exercise is not possible (e.g. pain, immobilisation.) Considerable evidence indicates movement imagery interventions improve physical function, but to maximize efficacy it is important to know the imagery abilities of the individuals being treated. Given the increase in the global sharing of knowledge it is desirable to use standard measures of imagery ability across language and cultures, thus motivating this project. The translation procedure followed guidelines from the Translation and Cultural Adaptation group of the International Society for Pharmacoeconomics and Outcomes Research and involved the following phases: Preparation; the original VMIQ-2 was adapted slightly to provide additional information and simplified grammar. Forward translation; three native speakers resident in Saudi Arabia translated the original VMIQ-2 from English to Arabic, following instruction to preserve meaning (not literal translation), and cultural relevance. Reconciliation; the project manager (first author), the primary translator and a physiotherapist reviewed the three independent translations to produce a reconciled first Arabic draft of VMIQ-2A. Backward translation; a fourth translator (native Arabic speaker fluent in English) translated literally the reconciled first Arabic draft to English. The project manager and two study authors compared the English back translation to the original VMIQ-2 and produced the second Arabic draft. Cognitive debriefing; to assess participants’ understanding of the second Arabic draft, 7 native Arabic speakers resident in the UK completed the questionnaire, and rated the clearness of the questions, specified difficult words or passages, and wrote in their own words their understanding of key terms. Following review of this feedback, a final Arabic version was created. 142 native Arabic speakers completed the questionnaire in community meeting places or at home; a subset of 44 participants completed the questionnaire a second time 1 week later. Results showed the translated questionnaire to be valid and reliable. Correlation coefficients indicated good test-retest reliability. Cronbach’s a indicated high internal consistency. Construct validity was tested in two ways. Imagery ability scores have been found to be invariant across gender; this result was replicated within the current study, assessed by independent-samples t-test. Additionally, experienced sports participants have higher imagery ability than those less experienced; this result was also replicated within the current study, assessed by analysis of variance, supporting construct validity. Results provide preliminary evidence that the VMIQ-2A is reliable and valid to be used with a general population who are native Arabic speakers. Future research will include validation of the VMIQ-2A in a larger sample, and testing validity in specific patient populations.

Keywords: motor imagery, physiotherapy, translation and validation, imagery ability

Procedia PDF Downloads 303
3191 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: power spectral density, 3D EEG model, brain balancing, kNN

Procedia PDF Downloads 457
3190 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis

Authors: Adrian-Gabriel Chifu, Sebastien Fournier

Abstract:

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

Keywords: sentiment analysis, difficulty, classification, machine learning

Procedia PDF Downloads 49
3189 Floating Quantifiers in Hijazi Arabic

Authors: Tagreed Alzahrani

Abstract:

The syntax of quantifiers has received much attention by linguists, philosophers and logicians within different frameworks and in various languages. However, the syntax of Arabic quantifiers has received limited attention in the literature, especially in relation to floating quantifiers. There have been a few discussions of floating quantifiers in Modern Standard Arabic (henceforth, MSA), although the analysis and the properties of their counterparts in other Saudi dialects are rare. Therefore, the aim of the paper is to provide a clear description of floating quantifiers (FQs) in Hijazi dialect (henceforth, HA) by utilising the following approaches: the adverbial approach, and the derivational (stranding) analysis. For a long time, Linguists have tried to explain the floating quantifiers’ phenomenon, as exemplified in the following sentences: 1. All the friends have watched the movie. 2. The friends have all watched the movie. The adverbial approach assumes that the floating quantifier is a type of adverb, because it occupies the adverbial position next to the verb. Thus, the subject in the first example is all the friends and the subject in the second example is the friends with all becoming an adverb, as it is located in an adverbial position. However, in stranding analysis, it is argued that the floating quantifier becomes stranded when its complement has moved to a higher position in the sentence [SPEC, TP]. Therefore, both sentences have the same subject all the friends, although in second example the friends has moved to a higher position and has stranded the quantifier all. The paper will investigate the floating quantifiers in HA using both approaches. The analysis will show that neither view is entirely successful in providing a unified account for FQs in HA.

Keywords: floating quantifier, adverbial analysis, stranding approach, universal quantifier

Procedia PDF Downloads 329
3188 Developed Text-Independent Speaker Verification System

Authors: Mohammed Arif, Abdessalam Kifouche

Abstract:

Speech is a very convenient way of communication between people and machines. It conveys information about the identity of the talker. Since speaker recognition technology is increasingly securing our everyday lives, the objective of this paper is to develop two automatic text-independent speaker verification systems (TI SV) using low-level spectral features and machine learning methods. (i) The first system is based on a support vector machine (SVM), which was widely used in voice signal processing with the aim of speaker recognition involving verifying the identity of the speaker based on its voice characteristics, and (ii) the second is based on Gaussian Mixture Model (GMM) and Universal Background Model (UBM) to combine different functions from different resources to implement the SVM based.

Keywords: speaker verification, text-independent, support vector machine, Gaussian mixture model, cepstral analysis

Procedia PDF Downloads 24
3187 The Sufi Madad in Arabic Literature and Translation

Authors: Riham Debian

Abstract:

This paper deals with the translational mystic in Arabic aesthetics and their linguistic and narrative revelation and mediation across textual spaces. The paper particularly engages with the nature of the Egyptian Sufi Madad, its relation to spaces/places, its intergenerational and intertextual manifestations, and its intersection with questions of identity—the historical spaces and geographical places one inhabits and embodies. Opening a repertoire between contextualized stylistics and poetics semiology (Boise-Bier2011; Jackobson 1960), the paper reads in al-Ghitany’s Kitab al-Tagiliat (The Book of Revelation1983), Bassiouny’s Sabil Al-Ghareq (2018) and its translation (Fountain of the Drowning2022). The paper examines the stylistic and poetical encoding and recoding of the Sufi Madads from Ghitany to Bassiouny and their entanglement in the question of Egyptian identity-politics through the embodiment of historical places and geographical spaces. The paper argues for the intergenerational intertextuality of Arabic aesthetics that stylistically and poetically enacts the mysticism of Sufi Madad through historical and geographical semioticization of the Egyptian character continuity across time and space. Both Ghitany and Bassiouny engage with the historical novel as a form of delivery of their Egyptian mystical relation with time and place. Both novelist-historians are involved with the question of place and the life-worlds that spaces generate across time and gender.

Keywords: intertextuality, interdiscusivity, madad, egyptian identity

Procedia PDF Downloads 72
3186 Teaching Synonyms for Non-Arabic Speakers

Authors: Loay Badran

Abstract:

This article on synonymy came into existence to meet the academic needs of students who specialize in this field. The article has two parts: the first part discusses the forms that authors of textbooks and dictionaries assumed when explaining a word as well as explaining the precision or lack of it thereof in delivering an understandable and clear meaning of using such forms. Meanwhile, the second part of this research article focuses on the application of synonymy and at taking into consideration the point of view of others who dismissed synonymy in its minute details, especially Alaskari in his book “Linguistic Differences” “Al Forouq Alloqhawiyyah”. The author determined that collecting the most commonly-used synonymous notions scattered in Alaskari’s book and compiling them in tables would be of great importance in easing lessons according to the Arabic Alphabet System meanwhile citing all that pertains to the corresponding scattered pages in “Linguistic Differences”.

Keywords: synonymy, semantics, camel, teaching, non-native

Procedia PDF Downloads 46
3185 Evolutionary Methods in Cryptography

Authors: Wafa Slaibi Alsharafat

Abstract:

Genetic algorithms (GA) are random algorithms as random numbers that are generated during the operation of the algorithm determine what happens. This means that if GA is applied twice to optimize exactly the same problem it might produces two different answers. In this project, we propose an evolutionary algorithm and Genetic Algorithm (GA) to be implemented in symmetric encryption and decryption. Here, user's message and user secret information (key) which represent plain text to be transferred into cipher text.

Keywords: GA, encryption, decryption, crossover

Procedia PDF Downloads 417
3184 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 121
3183 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction

Procedia PDF Downloads 144
3182 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 315
3181 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 35
3180 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 305
3179 Linguistic Misinterpretation and the Dialogue of Civilizations

Authors: Oleg Redkin, Olga Bernikova

Abstract:

Globalization and migrations have made cross-cultural contacts more frequent and intensive. Sometimes, these contacts may lead to misunderstanding between partners of communication and misinterpretations of the verbal messages that some researchers tend to consider as the 'clash of civilizations'. In most cases, reasons for that may be found in cultural and linguistic differences and hence misinterpretations of intentions and behavior. The current research examines factors of verbal and non-verbal communication that should be taken into consideration in verbal and non-verbal contacts. Language is one of the most important manifestations of the cultural code, and it is often considered as one of the special features of a civilization. The Arabic language, in particular, is commonly associated with Islam and the language and the Arab-Muslim civilization. It is one of the most important markers of self-identification for more than 200 million of native speakers. Arabic is the language of the Quran and hence the symbol of religious affiliation for more than one billion Muslims around the globe. Adequate interpretation of Arabic texts requires profound knowledge of its grammar, semantics of its vocabulary. Communicating sides who belong to different cultural groups are guided by different models of behavior and hierarchy of values, besides that the vocabulary each of them uses in the dialogue may convey different semantic realities and vary in connotations. In this context direct, literal translation in most cases cannot adequately convey the original meaning of the original message. Besides that peculiarities and diversities of the extralinguistic information, such as the body language, communicative etiquette, cultural background and religious affiliations may make the dialogue even more difficult. It is very likely that the so called 'clash of civilizations' in most cases is due to misinterpretation of counterpart's means of discourse such as language, cultural codes, and models of behavior rather than lies in basic contradictions between partners of communication. In the process of communication, one has to rely on universal values rather than focus on cultural or religious peculiarities, to take into account current linguistic and extralinguistic context.

Keywords: Arabic, civilization, discourse, language, linguistic

Procedia PDF Downloads 197
3178 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 363
3177 Feature Extraction and Classification Based on the Bayes Test for Minimum Error

Authors: Nasar Aldian Ambark Shashoa

Abstract:

Classification with a dimension reduction based on Bayesian approach is proposed in this paper . The first step is to generate a sample (parameter) of fault-free mode class and faulty mode class. The second, in order to obtain good classification performance, a selection of important features is done with the discrete karhunen-loeve expansion. Next, the Bayes test for minimum error is used to classify the classes. Finally, the results for simulated data demonstrate the capabilities of the proposed procedure.

Keywords: analytical redundancy, fault detection, feature extraction, Bayesian approach

Procedia PDF Downloads 503
3176 Network Traffic Classification Scheme for Internet Network Based on Application Categorization for Ipv6

Authors: Yaser Miaji, Mohammed Aloryani

Abstract:

The rise of recent applications in everyday implementation like videoconferencing, online recreation and voice speech communication leads to pressing the need for novel mechanism and policy to serve this steep improvement within the application itself and users‟ wants. This diversity in web traffics needs some classification and prioritization of the traffics since some traffics merit abundant attention with less delay and loss, than others. This research is intended to reinforce the mechanism by analysing the performance in application according to the proposed mechanism implemented. The mechanism used is quite direct and analytical. The mechanism is implemented by modifying the queue limit in the algorithm.

Keywords: traffic classification, IPv6, internet, application categorization

Procedia PDF Downloads 535
3175 Moral Wrongdoers: Evaluating the Value of Moral Actions Performed by War Criminals

Authors: Jean-Francois Caron

Abstract:

This text explores the value of moral acts performed by war criminals, and the extent to which they should alleviate the punishment these individuals ought to receive for violating the rules of war. Without neglecting the necessity of retribution in war crimes cases, it argues from an ethical perspective that we should not rule out the possibility of considering lesser punishments for war criminals who decide to perform a moral act, as it might produce significant positive moral outcomes. This text also analyzes how such a norm could be justified from a moral perspective.

Keywords: war criminals, pardon, amnesty, retribution

Procedia PDF Downloads 256
3174 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 155
3173 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 128
3172 Contextual Toxicity Detection with Data Augmentation

Authors: Julia Ive, Lucia Specia

Abstract:

Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.

Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing

Procedia PDF Downloads 141
3171 Comparison of the Classification of Cystic Renal Lesions Using the Bosniak Classification System with Contrast Enhanced Ultrasound and Magnetic Resonance Imaging to Computed Tomography: A Prospective Study

Authors: Dechen Tshering Vogel, Johannes T. Heverhagen, Bernard Kiss, Spyridon Arampatzis

Abstract:

In addition to computed tomography (CT), contrast enhanced ultrasound (CEUS), and magnetic resonance imaging (MRI) are being increasingly used for imaging of renal lesions. The aim of this prospective study was to compare the classification of complex cystic renal lesions using the Bosniak classification with CEUS and MRI to CT. Forty-eight patients with 65 cystic renal lesions were included in this study. All participants signed written informed consent. The agreement between the Bosniak classifications of complex renal lesions ( ≥ BII-F) on CEUS and MRI were compared to that of CT and were tested using Cohen’s Kappa. Sensitivity, specificity, positive and negative predictive values (PPV/NPV) and the accuracy of CEUS and MRI compared to CT in the detection of complex renal lesions were calculated. Twenty-nine (45%) out of 65 cystic renal lesions were classified as complex using CT. The agreement between CEUS and CT in the classification of complex cysts was fair (agreement 50.8%, Kappa 0.31), and was excellent between MRI and CT (agreement 93.9%, Kappa 0.88). Compared to CT, MRI had a sensitivity of 96.6%, specificity of 91.7%, a PPV of 54.7%, and an NPV of 54.7% with an accuracy of 63.1%. The corresponding values for CEUS were sensitivity 100.0%, specificity 33.3%, PPV 90.3%, and NPV 97.1% with an accuracy 93.8%. The classification of complex renal cysts based on MRI and CT scans correlated well, and MRI can be used instead of CT for this purpose. CEUS can exclude complex lesions, but due to higher sensitivity, cystic lesions tend to be upgraded. However, it is useful for initial imaging, for follow up of lesions and in those patients with contraindications to CT and MRI.

Keywords: Bosniak classification, computed tomography, contrast enhanced ultrasound, cystic renal lesions, magnetic resonance imaging

Procedia PDF Downloads 117