Search results for: Arabic text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3602

Search results for: Arabic text classification

3212 3D Receiver Operator Characteristic Histogram

Authors: Xiaoli Zhang, Xiongfei Li, Yuncong Feng

Abstract:

ROC curves, as a widely used evaluating tool in machine learning field, are the tradeoff of true positive rate and negative rate. However, they are blamed for ignoring some vital information in the evaluation process, such as the amount of information about the target that each instance carries, predicted score given by each classification model to each instance. Hence, in this paper, a new classification performance method is proposed by extending the Receiver Operator Characteristic (ROC) curves to 3D space, which is denoted as 3D ROC Histogram. In the histogram, the

Keywords: classification, performance evaluation, receiver operating characteristic histogram, hardness prediction

Procedia PDF Downloads 293
3211 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: data retrieval, information retrieval, natural language processing, text structuring

Procedia PDF Downloads 317
3210 The Impact of Developing an Educational Unit in the Light of Twenty-First Century Skills in Developing Language Skills for Non-Arabic Speakers: A Proposed Program for Application to Students of Educational Series in Regular Schools

Authors: Erfan Abdeldaim Mohamed Ahmed Abdalla

Abstract:

The era of the knowledge explosion in which we live requires us to develop educational curricula quantitatively and qualitatively to adapt to the twenty-first-century skills of critical thinking, problem-solving, communication, cooperation, creativity, and innovation. The process of developing the curriculum is as significant as building it; in fact, the development of curricula may be more difficult than building them. And curriculum development includes analyzing needs, setting goals, designing the content and educational materials, creating language programs, developing teachers, applying for programmes in schools, monitoring and feedback, and then evaluating the language programme resulting from these processes. When we look back at the history of language teaching during the twentieth century, we find that developing the delivery method is the most crucial aspect of change in language teaching doctrines. The concept of delivery method in teaching is a systematic set of teaching practices based on a specific theory of language acquisition. This is a key consideration, as the process of development must include all the curriculum elements in its comprehensive sense: linguistically and non-linguistically. The various Arabic curricula provide the student with a set of units, each unit consisting of a set of linguistic elements. These elements are often not logically arranged, and more importantly, they neglect essential points and highlight other less important ones. Moreover, the educational curricula entail a great deal of monotony in the presentation of content, which makes it hard for the teacher to select adequate content; so that the teacher often navigates among diverse references to prepare a lesson and hardly finds the suitable one. Similarly, the student often gets bored when learning the Arabic language and fails to fulfill considerable progress in it. Therefore, the problem is not related to the lack of curricula, but the problem is the development of the curriculum with all its linguistic and non-linguistic elements in accordance with contemporary challenges and standards for teaching foreign languages. The Arabic library suffers from a lack of references for curriculum development. In this paper, the researcher investigates the elements of development, such as the teacher, content, methods, objectives, evaluation, and activities. Hence, a set of general guidelines in the field of educational development were reached. The paper highlights the need to identify weaknesses in educational curricula, decide the twenty-first-century skills that must be employed in Arabic education curricula, and the employment of foreign language teaching standards in current Arabic Curricula. The researcher assumes that the series of teaching Arabic to speakers of other languages in regular schools do not address the skills of the twenty-first century, which is what the researcher tries to apply in the proposed unit. The experimental method is the method of this study. It is based on two groups: experimental and control. The development of an educational unit will help build suitable educational series for students of the Arabic language in regular schools, in which twenty-first-century skills and standards for teaching foreign languages will be addressed and be more useful and attractive to students.

Keywords: curriculum, development, Arabic language, non-native, skills

Procedia PDF Downloads 48
3209 Combined Odd Pair Autoregressive Coefficients for Epileptic EEG Signals Classification by Radial Basis Function Neural Network

Authors: Boukari Nassim

Abstract:

This paper describes the use of odd pair autoregressive coefficients (Yule _Walker and Burg) for the feature extraction of electroencephalogram (EEG) signals. In the classification: the radial basis function neural network neural network (RBFNN) is employed. The RBFNN is described by his architecture and his characteristics: as the RBF is defined by the spread which is modified for improving the results of the classification. Five types of EEG signals are defined for this work: Set A, Set B for normal signals, Set C, Set D for interictal signals, set E for ictal signal (we can found that in Bonn university). In outputs, two classes are given (AC, AD, AE, BC, BD, BE, CE, DE), the best accuracy is calculated at 99% for the combined odd pair autoregressive coefficients. Our method is very effective for the diagnosis of epileptic EEG signals.

Keywords: epilepsy, EEG signals classification, combined odd pair autoregressive coefficients, radial basis function neural network

Procedia PDF Downloads 325
3208 Procedures and Strategies in Translation: Two Marathi Translations of Train to Pakistan by Khushwant Singh

Authors: Manoj Gujar

Abstract:

The present paper is an attempt to interpret two Marathi translations of Khushwant Singh’s (1915-2014) novel Train to Pakistan (1956). The 20th century was branded as an era of Liberalization, Privatization and Globalization. Different countries and cultures have enunciated interaction with one another in an unprecedented manner. The world is becoming multilingual and multicultural. The democratic countries such as the U.S.A., the U.K., and India have become pivotal centers of interlingual and cross-cultural exchange. People belonging to different nationalities showed keen interest in knowing the characteristic features of different languages and of their cultures. Here, ‘Translation’ plays an important role in such multilingual and multicultural contexts. Translation is not only translation of a language but a translation of a culture. However, in the act of translation a translator makes use of such procedures as borrowing, definition, literal translation, substitution, lexical creation, omission, addition as well as their various combinations. To him, a text produced in one linguistic and cultural context can reach other linguistic and cultural contexts through these processes of translation. A worthy work of art appeals many readers. India, being a multilingual country we find that there goes multiple translations of the same text in different Indian languages. But sometimes, if can be found that a same text appeals to different ages and the same text gets translated into the same language by the two or more authors. In this reference, the present paper is an attempt to study how different translations of the same text differ in terms of procedures and strategies during the process of the translation of culture. The source text is Khushwant Singh’s historical novel Train to Pakistan (1956). The novel was widely appreciated and so translated into different regional languages in India. The novel has two Marathi translations: Agniratha (1972) by Hidayatkhan and Train to Pakistan (1980) by Anil Kinikar. This paper is an attempt to evaluate the strategies and procedures in translation to analyze these two Marathi translations. Hidayat Khan made a lot of omissions of the significant details and distorted the original text to a large extent, whereas, Anil Kinikar has done justice to the Source Text by rendering it in Marathi as faithfully as possible.

Keywords: culture, multilingual, procedures and strategies, translation

Procedia PDF Downloads 350
3207 Unraveling the Threads of Madness: Henry Russell’s 'The Maniac' as an Advocate for Deinstitutionalization in the Nineteenth Century

Authors: T. J. Laws-Nicola

Abstract:

Henry Russell was best known as a composer of more than 300 songs. Many of his compositions were popular for both their sentimental texts, as in ‘The Old Armchair,’ and those of a more political nature, such as ‘Woodsman, Spare That Tree!’ Indeed, Russell had written such songs of advocacy as those associated with abolitionism (‘The Slave Ship’) and environmentalism (‘Woodsman, Spare that Tree!’). ‘The Maniac’ is his only composition addressing the issue of institutionalization. The text is borrowed and adapted from the monodrama The Captive by M.G. ‘Monk’ Lewis. Through an analysis of form, harmony, melody, text, and thematic development and interactions between text and music we can approach a clearer understanding of ‘The Maniac’ and how the text and music interact. Select periodicals, such as The London Times, provide contemporary critical review for ‘The Maniac.’ Additional nineteenth century songs whose texts focus on madness and/or institutionalization will assist in building a stylistic and cultural context for ‘The Maniac.’ Through comparative analyses of ‘The Maniac’ with a body of songs that focus on similar topics, we can approach a clear understanding of the song as a vehicle for deinstitutionalization.

Keywords: 19th century song, institutionalization, M. G. Lewis, Henry Russell

Procedia PDF Downloads 505
3206 Literary Theatre and Embodied Theatre: A Practice-Based Research in Exploring the Authorship of a Performance

Authors: Rahul Bishnoi

Abstract:

Theatre, as Ann Ubersfld calls it, is a paradox. At once, it is both a literary work and a physical representation. Theatre as a text is eternal, reproducible, and identical while as a performance, theatre is momentary and never identical to the previous performances. In this dual existence of theatre, who is the author? Is the author the playwright who writes the dramatic text, or the director who orchestrates the performance, or the actor who embodies the text? From the poststructuralist lens of Barthes, the author is dead. Barthes’ argument of discrete temporality, i.e. the author is the before, and the text is the after, does not hold true for theatre. A published literary work is written, edited, printed, distributed and then gets consumed by the reader. On the other hand, theatrical production is immediate; an actor performs and the audience witnesses it instantaneously. Time, so to speak, does not separate the author, the text, and the reader anymore. The question of authorship gets further complicated in Augusto Boal’s “Theatre of the Oppressed” movement where the audience is a direct participant like the actors in the performance. In this research, through an experimental performance, the duality of theatre is explored with the authorship discourse. And the conventional definition of authorship is subjected to additional complexity by erasing the distinction between an actor and the audience. The design/methodology of the experimental performance is as follows: The audience will be asked to produce a text under an anonymous virtual alias. The text, as it is being produced, will be read and performed by the actor. The audience who are also collectively “authoring” the text, will watch this performance and write further until everyone has contributed with one input each. The cycle of writing, reading, performing, witnessing, and writing will continue until the end. The intention is to create a dynamic system of writing/reading with the embodiment of the text through the actor. The actor is giving up the power to the audience to write the spoken word, stage instruction and direction while still keeping the agency of interpreting that input and performing in the chosen manner. This rapid conversation between the actor and the audience also creates a conversion of authorship. The main conclusion of this study is a perspective on the nature of dynamic authorship of theatre containing a critical enquiry of the collaboratively produced text, an individually performed act, and a collectively witnessed event. Using practice as a methodology, this paper contests the poststructuralist notion of the author as merely a ‘scriptor’ and breaks it further by involving the audience in the authorship as well.

Keywords: practice based research, performance studies, post-humanism, Avant-garde art, theatre

Procedia PDF Downloads 79
3205 Automatic Classification Using Dynamic Fuzzy C Means Algorithm and Mathematical Morphology: Application in 3D MRI Image

Authors: Abdelkhalek Bakkari

Abstract:

Image segmentation is a critical step in image processing and pattern recognition. In this paper, we proposed a new robust automatic image classification based on a dynamic fuzzy c-means algorithm and mathematical morphology. The proposed segmentation algorithm (DFCM_MM) has been applied to MR perfusion images. The obtained results show the validity and robustness of the proposed approach.

Keywords: segmentation, classification, dynamic, fuzzy c-means, MR image

Procedia PDF Downloads 448
3204 A Tool to Measure the Usability Guidelines for Arab E-Government Websites

Authors: Omyma Alosaimi, Asma Alsumait

Abstract:

The website developer and designer should follow usability guidelines to provide a user-friendly interface. Using tools to measure usability, the evaluator can evaluate automatically hundreds of links within few minutes. It has the advantage of detecting some violations that only machines can detect. For that using usability evaluating tool is important to find as many violations as possible. There are many websites usability testing tools, but none is developed to measure the usability of e-government website nor Arabic e-government websites. To measure the usability of the Arabic e-government websites, a tool is developed and tested in this paper. A comparison of using a tool specifically developed for e-government websites and general usability testing tool is presented.

Keywords: e-government, human computer interaction, usability evaluation, usability guidelines

Procedia PDF Downloads 399
3203 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 147
3202 Text Mining Techniques for Prioritizing Pathogenic Mutations in Protein Families Known to Misfold or Aggregate

Authors: Khaleel Saleh Al-Rababah

Abstract:

Amyloid fibril forming regions, which are known as protein aggregates, in sequences of some protein families are associated with a number of diseases known as amyloidosis. Mutations play a role in forming fibrils by accelerating the fibril formation process. In this paper we want to extract diseases that caused by those mutations as a result of the impact of the mutations on structural and functional properties of the aggregated protein. We propose a text mining system, to automatically extract mutations, diseases and relations between mutations and diseases. We presented an algorithm based on finite state to cluster mutations found in the same sentence as a sentence could contain different mutation cause different diseases. Also, we presented a co reference algorithm that enables cross-link sentences.

Keywords: amyloid, amyloidosis, co reference, protein, text mining

Procedia PDF Downloads 501
3201 Classification of Construction Projects

Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside

Abstract:

To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.

Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment

Procedia PDF Downloads 654
3200 The Application of Lesson Study Model in Writing Review Text in Junior High School

Authors: Sulastriningsih Djumingin

Abstract:

This study has some objectives. It aims at describing the ability of the second-grade students to write review text without applying the Lesson Study model at SMPN 18 Makassar. Second, it seeks to describe the ability of the second-grade students to write review text by applying the Lesson Study model at SMPN 18 Makassar. Third, it aims at testing the effectiveness of the Lesson Study model in writing review text at SMPN 18 Makassar. This research was true experimental design with posttest Only group design involving two groups consisting of one class of the control group and one class of the experimental group. The research populations were all the second-grade students at SMPN 18 Makassar amounted to 250 students consisting of 8 classes. The sampling technique was purposive sampling technique. The control class was VIII2 consisting of 30 students, while the experimental class was VIII8 consisting of 30 students. The research instruments were in the form of observation and tests. The collected data were analyzed using descriptive statistical techniques and inferential statistical techniques with t-test types processed using SPSS 21 for windows. The results shows that: (1) of 30 students in control class, there are only 14 (47%) students who get the score more than 7.5, categorized as inadequate; (2) in the experimental class, there are 26 (87%) students who obtain the score of 7.5, categorized as adequate; (3) the Lesson Study models is effective to be applied in writing review text. Based on the comparison of the ability of the control class and experimental class, it indicates that the value of t-count is greater than the value of t-table (2.411> 1.667). It means that the alternative hypothesis (H1) proposed by the researcher is accepted.

Keywords: application, lesson study, review text, writing

Procedia PDF Downloads 180
3199 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui

Abstract:

Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 478
3198 Impact of Tryptic Limited Hydrolysis on Bambara Protein-Gum Arabic Soluble Complexes Formation

Authors: Abiola A. Ojesanmi, Eric O. Amonsou

Abstract:

The formation of soluble complexes is usually within a narrow pH range characterized by weak interactions. Moreover, the rigid conformation of globular proteins restricts the number of charged groups capable of interacting with polysaccharides, thereby limiting food applications. Hence, this study investigated the impact of tryptic-limited hydrolysis on the formation of Bambara protein-gum arabic soluble complexes formation. The electrostatic interactions were monitored through turbidimetry analysis. The Bambara protein hydrolysates at a specified degree of hydrolysis, and DHs (2, 5, and 7.5) were characterized using size exclusion chromatography, zeta potential, surface hydrophobicity, and intrinsic fluorescence. The stability of the complexes was investigated using differential scanning calorimetry and rheometry. The limited tryptic hydrolysis significantly widened the pH range of the formation of soluble complexes, with DH 5 having a wider range (pH 7.0 - 4.3) compared to DH 2 and DH 7.5, while there was no notable difference in the optimum complexation pH of the insoluble complexes. Larger peptides (140, 118 kDa) were detected in DH 2 relative to 144, 70, and 61 kDa in DH 5, which were larger than 140, 118, 48, and 32 kDa in DH 7. 5. An increase in net negative charge (- 30 Mv for DH 7.5) and a slight shift in the net neutrality (from pH 4.9 to 4.3) of the hydrolysates were observed which consequently impacted the electrostatic interaction with gum arabic. There was exposure of the hydrophobic amino acids up to 4-fold in comparison with the isolate and a red shift in maximum fluorescence wavelength in DH dependent manner following the hydrolysis. The denaturation temperature of the soluble complex from the hydrolysates shifted to higher values, having DH 5 with the maximum temperature (94.24 °C). A highly interconnected gel-like soluble complex network was formed having DH 5 with a better structure relative to DH 2 and 7.5. The study showed the use of limited tryptic hydrolysis at DH 5 as an effective approach to modify Bambara protein and provided a more stable and wider pH range of formation for soluble complex, thereby enhancing the food application.

Keywords: Bambara groundnut, gum arabic, interaction, soluble complex

Procedia PDF Downloads 0
3197 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: power spectral density, 3D EEG model, brain balancing, kNN

Procedia PDF Downloads 460
3196 Translation, Cross-Cultural Adaption, and Validation of the Vividness of Movement Imagery Questionnaire 2 (VMIQ-2) to Classical Arabic Language

Authors: Majid Alenezi, Abdelbare Algamode, Amy Hayes, Gavin Lawrence, Nichola Callow

Abstract:

The purpose of this study was to translate and culturally adapt the Vividness of Movement Imagery Questionnaire-2 (VMIQ-2) from English to produce a new Arabic version (VMIQ-2A), and to evaluate the reliability and validity of the translated questionnaire. The questionnaire assesses how vividly and clearly individuals are able to imagine themselves performing everyday actions. Its purpose is to measure individuals’ ability to conduct movement imagery, which can be defined as “the cognitive rehearsal of a task in the absence of overt physical movement.” Movement imagery has been introduced in physiotherapy as a promising intervention technique, especially when physical exercise is not possible (e.g. pain, immobilisation.) Considerable evidence indicates movement imagery interventions improve physical function, but to maximize efficacy it is important to know the imagery abilities of the individuals being treated. Given the increase in the global sharing of knowledge it is desirable to use standard measures of imagery ability across language and cultures, thus motivating this project. The translation procedure followed guidelines from the Translation and Cultural Adaptation group of the International Society for Pharmacoeconomics and Outcomes Research and involved the following phases: Preparation; the original VMIQ-2 was adapted slightly to provide additional information and simplified grammar. Forward translation; three native speakers resident in Saudi Arabia translated the original VMIQ-2 from English to Arabic, following instruction to preserve meaning (not literal translation), and cultural relevance. Reconciliation; the project manager (first author), the primary translator and a physiotherapist reviewed the three independent translations to produce a reconciled first Arabic draft of VMIQ-2A. Backward translation; a fourth translator (native Arabic speaker fluent in English) translated literally the reconciled first Arabic draft to English. The project manager and two study authors compared the English back translation to the original VMIQ-2 and produced the second Arabic draft. Cognitive debriefing; to assess participants’ understanding of the second Arabic draft, 7 native Arabic speakers resident in the UK completed the questionnaire, and rated the clearness of the questions, specified difficult words or passages, and wrote in their own words their understanding of key terms. Following review of this feedback, a final Arabic version was created. 142 native Arabic speakers completed the questionnaire in community meeting places or at home; a subset of 44 participants completed the questionnaire a second time 1 week later. Results showed the translated questionnaire to be valid and reliable. Correlation coefficients indicated good test-retest reliability. Cronbach’s a indicated high internal consistency. Construct validity was tested in two ways. Imagery ability scores have been found to be invariant across gender; this result was replicated within the current study, assessed by independent-samples t-test. Additionally, experienced sports participants have higher imagery ability than those less experienced; this result was also replicated within the current study, assessed by analysis of variance, supporting construct validity. Results provide preliminary evidence that the VMIQ-2A is reliable and valid to be used with a general population who are native Arabic speakers. Future research will include validation of the VMIQ-2A in a larger sample, and testing validity in specific patient populations.

Keywords: motor imagery, physiotherapy, translation and validation, imagery ability

Procedia PDF Downloads 304
3195 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis

Authors: Adrian-Gabriel Chifu, Sebastien Fournier

Abstract:

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

Keywords: sentiment analysis, difficulty, classification, machine learning

Procedia PDF Downloads 52
3194 Floating Quantifiers in Hijazi Arabic

Authors: Tagreed Alzahrani

Abstract:

The syntax of quantifiers has received much attention by linguists, philosophers and logicians within different frameworks and in various languages. However, the syntax of Arabic quantifiers has received limited attention in the literature, especially in relation to floating quantifiers. There have been a few discussions of floating quantifiers in Modern Standard Arabic (henceforth, MSA), although the analysis and the properties of their counterparts in other Saudi dialects are rare. Therefore, the aim of the paper is to provide a clear description of floating quantifiers (FQs) in Hijazi dialect (henceforth, HA) by utilising the following approaches: the adverbial approach, and the derivational (stranding) analysis. For a long time, Linguists have tried to explain the floating quantifiers’ phenomenon, as exemplified in the following sentences: 1. All the friends have watched the movie. 2. The friends have all watched the movie. The adverbial approach assumes that the floating quantifier is a type of adverb, because it occupies the adverbial position next to the verb. Thus, the subject in the first example is all the friends and the subject in the second example is the friends with all becoming an adverb, as it is located in an adverbial position. However, in stranding analysis, it is argued that the floating quantifier becomes stranded when its complement has moved to a higher position in the sentence [SPEC, TP]. Therefore, both sentences have the same subject all the friends, although in second example the friends has moved to a higher position and has stranded the quantifier all. The paper will investigate the floating quantifiers in HA using both approaches. The analysis will show that neither view is entirely successful in providing a unified account for FQs in HA.

Keywords: floating quantifier, adverbial analysis, stranding approach, universal quantifier

Procedia PDF Downloads 331
3193 Developed Text-Independent Speaker Verification System

Authors: Mohammed Arif, Abdessalam Kifouche

Abstract:

Speech is a very convenient way of communication between people and machines. It conveys information about the identity of the talker. Since speaker recognition technology is increasingly securing our everyday lives, the objective of this paper is to develop two automatic text-independent speaker verification systems (TI SV) using low-level spectral features and machine learning methods. (i) The first system is based on a support vector machine (SVM), which was widely used in voice signal processing with the aim of speaker recognition involving verifying the identity of the speaker based on its voice characteristics, and (ii) the second is based on Gaussian Mixture Model (GMM) and Universal Background Model (UBM) to combine different functions from different resources to implement the SVM based.

Keywords: speaker verification, text-independent, support vector machine, Gaussian mixture model, cepstral analysis

Procedia PDF Downloads 25
3192 The Sufi Madad in Arabic Literature and Translation

Authors: Riham Debian

Abstract:

This paper deals with the translational mystic in Arabic aesthetics and their linguistic and narrative revelation and mediation across textual spaces. The paper particularly engages with the nature of the Egyptian Sufi Madad, its relation to spaces/places, its intergenerational and intertextual manifestations, and its intersection with questions of identity—the historical spaces and geographical places one inhabits and embodies. Opening a repertoire between contextualized stylistics and poetics semiology (Boise-Bier2011; Jackobson 1960), the paper reads in al-Ghitany’s Kitab al-Tagiliat (The Book of Revelation1983), Bassiouny’s Sabil Al-Ghareq (2018) and its translation (Fountain of the Drowning2022). The paper examines the stylistic and poetical encoding and recoding of the Sufi Madads from Ghitany to Bassiouny and their entanglement in the question of Egyptian identity-politics through the embodiment of historical places and geographical spaces. The paper argues for the intergenerational intertextuality of Arabic aesthetics that stylistically and poetically enacts the mysticism of Sufi Madad through historical and geographical semioticization of the Egyptian character continuity across time and space. Both Ghitany and Bassiouny engage with the historical novel as a form of delivery of their Egyptian mystical relation with time and place. Both novelist-historians are involved with the question of place and the life-worlds that spaces generate across time and gender.

Keywords: intertextuality, interdiscusivity, madad, egyptian identity

Procedia PDF Downloads 72
3191 Teaching Synonyms for Non-Arabic Speakers

Authors: Loay Badran

Abstract:

This article on synonymy came into existence to meet the academic needs of students who specialize in this field. The article has two parts: the first part discusses the forms that authors of textbooks and dictionaries assumed when explaining a word as well as explaining the precision or lack of it thereof in delivering an understandable and clear meaning of using such forms. Meanwhile, the second part of this research article focuses on the application of synonymy and at taking into consideration the point of view of others who dismissed synonymy in its minute details, especially Alaskari in his book “Linguistic Differences” “Al Forouq Alloqhawiyyah”. The author determined that collecting the most commonly-used synonymous notions scattered in Alaskari’s book and compiling them in tables would be of great importance in easing lessons according to the Arabic Alphabet System meanwhile citing all that pertains to the corresponding scattered pages in “Linguistic Differences”.

Keywords: synonymy, semantics, camel, teaching, non-native

Procedia PDF Downloads 48
3190 Evolutionary Methods in Cryptography

Authors: Wafa Slaibi Alsharafat

Abstract:

Genetic algorithms (GA) are random algorithms as random numbers that are generated during the operation of the algorithm determine what happens. This means that if GA is applied twice to optimize exactly the same problem it might produces two different answers. In this project, we propose an evolutionary algorithm and Genetic Algorithm (GA) to be implemented in symmetric encryption and decryption. Here, user's message and user secret information (key) which represent plain text to be transferred into cipher text.

Keywords: GA, encryption, decryption, crossover

Procedia PDF Downloads 418
3189 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 121
3188 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction

Procedia PDF Downloads 149
3187 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 316
3186 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 42
3185 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 307
3184 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 364
3183 Linguistic Misinterpretation and the Dialogue of Civilizations

Authors: Oleg Redkin, Olga Bernikova

Abstract:

Globalization and migrations have made cross-cultural contacts more frequent and intensive. Sometimes, these contacts may lead to misunderstanding between partners of communication and misinterpretations of the verbal messages that some researchers tend to consider as the 'clash of civilizations'. In most cases, reasons for that may be found in cultural and linguistic differences and hence misinterpretations of intentions and behavior. The current research examines factors of verbal and non-verbal communication that should be taken into consideration in verbal and non-verbal contacts. Language is one of the most important manifestations of the cultural code, and it is often considered as one of the special features of a civilization. The Arabic language, in particular, is commonly associated with Islam and the language and the Arab-Muslim civilization. It is one of the most important markers of self-identification for more than 200 million of native speakers. Arabic is the language of the Quran and hence the symbol of religious affiliation for more than one billion Muslims around the globe. Adequate interpretation of Arabic texts requires profound knowledge of its grammar, semantics of its vocabulary. Communicating sides who belong to different cultural groups are guided by different models of behavior and hierarchy of values, besides that the vocabulary each of them uses in the dialogue may convey different semantic realities and vary in connotations. In this context direct, literal translation in most cases cannot adequately convey the original meaning of the original message. Besides that peculiarities and diversities of the extralinguistic information, such as the body language, communicative etiquette, cultural background and religious affiliations may make the dialogue even more difficult. It is very likely that the so called 'clash of civilizations' in most cases is due to misinterpretation of counterpart's means of discourse such as language, cultural codes, and models of behavior rather than lies in basic contradictions between partners of communication. In the process of communication, one has to rely on universal values rather than focus on cultural or religious peculiarities, to take into account current linguistic and extralinguistic context.

Keywords: Arabic, civilization, discourse, language, linguistic

Procedia PDF Downloads 197