Search results for: Text classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3228

Search results for: Text classification

3018 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG

Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat

Abstract:

Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.

Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy

Procedia PDF Downloads 489
3017 3D Receiver Operator Characteristic Histogram

Authors: Xiaoli Zhang, Xiongfei Li, Yuncong Feng

Abstract:

ROC curves, as a widely used evaluating tool in machine learning field, are the tradeoff of true positive rate and negative rate. However, they are blamed for ignoring some vital information in the evaluation process, such as the amount of information about the target that each instance carries, predicted score given by each classification model to each instance. Hence, in this paper, a new classification performance method is proposed by extending the Receiver Operator Characteristic (ROC) curves to 3D space, which is denoted as 3D ROC Histogram. In the histogram, the

Keywords: classification, performance evaluation, receiver operating characteristic histogram, hardness prediction

Procedia PDF Downloads 289
3016 Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

Authors: Khaled M. Alhawiti

Abstract:

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Keywords: data retrieval, information retrieval, natural language processing, text structuring

Procedia PDF Downloads 312
3015 Spacial Poetic Text throughout Samih al-Qasim's Poetry

Authors: Saleem Abu Jaber, Khaled Igbaria

Abstract:

For readers, space/place is one of the most significant references to reveal deep significances and indications in modern Arabic poetic texts. Generally, when poets evoke places and/or spaces, they do not mean to refer readers to detailed geographic or physical spaces, but to the symbolic significances and dimensions that those spaces have and through which poets encourage spacial awareness in their readers. Recently, as a result, there has been a great deal of interest in research addressing spacial poetic texts and dimensions in modern Arabic poetry in general and in Palestinian poetry in particular. Samih al-Qasim is one of the most recent prominent Palestinian revolutionary poets. Al-Qasim has published six series of poems that are well known in the Arab world. Although several researchers have studied al-Qasim's poetry, to our knowledge, yet no one has studied the aspect of spacial poetic text in his poetry. Therefore, this paper seeks to fill a gap in the scholarship that has not been addressed up to now. This article aims, not only to demonstrate the presence of spacial poetic text and dimensions throughout al-Qasim's poetry, but also to investigate the purpose for which the poet uses spacial poetic text. Our theory is that the poet, consciously and significantly, uses spacial poetic texts to magnify the Palestinian identity of the Palestinian readers.  Methodologically, we applied a descriptive analytic method, referencing al-Qasim's poetry, addressing spacial poetic texts practically but not theoretically or statistically.

Keywords: spatial poetic text, Samih al-Qasim, space and identity, Palestinian poetry

Procedia PDF Downloads 286
3014 Combined Odd Pair Autoregressive Coefficients for Epileptic EEG Signals Classification by Radial Basis Function Neural Network

Authors: Boukari Nassim

Abstract:

This paper describes the use of odd pair autoregressive coefficients (Yule _Walker and Burg) for the feature extraction of electroencephalogram (EEG) signals. In the classification: the radial basis function neural network neural network (RBFNN) is employed. The RBFNN is described by his architecture and his characteristics: as the RBF is defined by the spread which is modified for improving the results of the classification. Five types of EEG signals are defined for this work: Set A, Set B for normal signals, Set C, Set D for interictal signals, set E for ictal signal (we can found that in Bonn university). In outputs, two classes are given (AC, AD, AE, BC, BD, BE, CE, DE), the best accuracy is calculated at 99% for the combined odd pair autoregressive coefficients. Our method is very effective for the diagnosis of epileptic EEG signals.

Keywords: epilepsy, EEG signals classification, combined odd pair autoregressive coefficients, radial basis function neural network

Procedia PDF Downloads 321
3013 Procedures and Strategies in Translation: Two Marathi Translations of Train to Pakistan by Khushwant Singh

Authors: Manoj Gujar

Abstract:

The present paper is an attempt to interpret two Marathi translations of Khushwant Singh’s (1915-2014) novel Train to Pakistan (1956). The 20th century was branded as an era of Liberalization, Privatization and Globalization. Different countries and cultures have enunciated interaction with one another in an unprecedented manner. The world is becoming multilingual and multicultural. The democratic countries such as the U.S.A., the U.K., and India have become pivotal centers of interlingual and cross-cultural exchange. People belonging to different nationalities showed keen interest in knowing the characteristic features of different languages and of their cultures. Here, ‘Translation’ plays an important role in such multilingual and multicultural contexts. Translation is not only translation of a language but a translation of a culture. However, in the act of translation a translator makes use of such procedures as borrowing, definition, literal translation, substitution, lexical creation, omission, addition as well as their various combinations. To him, a text produced in one linguistic and cultural context can reach other linguistic and cultural contexts through these processes of translation. A worthy work of art appeals many readers. India, being a multilingual country we find that there goes multiple translations of the same text in different Indian languages. But sometimes, if can be found that a same text appeals to different ages and the same text gets translated into the same language by the two or more authors. In this reference, the present paper is an attempt to study how different translations of the same text differ in terms of procedures and strategies during the process of the translation of culture. The source text is Khushwant Singh’s historical novel Train to Pakistan (1956). The novel was widely appreciated and so translated into different regional languages in India. The novel has two Marathi translations: Agniratha (1972) by Hidayatkhan and Train to Pakistan (1980) by Anil Kinikar. This paper is an attempt to evaluate the strategies and procedures in translation to analyze these two Marathi translations. Hidayat Khan made a lot of omissions of the significant details and distorted the original text to a large extent, whereas, Anil Kinikar has done justice to the Source Text by rendering it in Marathi as faithfully as possible.

Keywords: culture, multilingual, procedures and strategies, translation

Procedia PDF Downloads 346
3012 Unraveling the Threads of Madness: Henry Russell’s 'The Maniac' as an Advocate for Deinstitutionalization in the Nineteenth Century

Authors: T. J. Laws-Nicola

Abstract:

Henry Russell was best known as a composer of more than 300 songs. Many of his compositions were popular for both their sentimental texts, as in ‘The Old Armchair,’ and those of a more political nature, such as ‘Woodsman, Spare That Tree!’ Indeed, Russell had written such songs of advocacy as those associated with abolitionism (‘The Slave Ship’) and environmentalism (‘Woodsman, Spare that Tree!’). ‘The Maniac’ is his only composition addressing the issue of institutionalization. The text is borrowed and adapted from the monodrama The Captive by M.G. ‘Monk’ Lewis. Through an analysis of form, harmony, melody, text, and thematic development and interactions between text and music we can approach a clearer understanding of ‘The Maniac’ and how the text and music interact. Select periodicals, such as The London Times, provide contemporary critical review for ‘The Maniac.’ Additional nineteenth century songs whose texts focus on madness and/or institutionalization will assist in building a stylistic and cultural context for ‘The Maniac.’ Through comparative analyses of ‘The Maniac’ with a body of songs that focus on similar topics, we can approach a clear understanding of the song as a vehicle for deinstitutionalization.

Keywords: 19th century song, institutionalization, M. G. Lewis, Henry Russell

Procedia PDF Downloads 503
3011 Literary Theatre and Embodied Theatre: A Practice-Based Research in Exploring the Authorship of a Performance

Authors: Rahul Bishnoi

Abstract:

Theatre, as Ann Ubersfld calls it, is a paradox. At once, it is both a literary work and a physical representation. Theatre as a text is eternal, reproducible, and identical while as a performance, theatre is momentary and never identical to the previous performances. In this dual existence of theatre, who is the author? Is the author the playwright who writes the dramatic text, or the director who orchestrates the performance, or the actor who embodies the text? From the poststructuralist lens of Barthes, the author is dead. Barthes’ argument of discrete temporality, i.e. the author is the before, and the text is the after, does not hold true for theatre. A published literary work is written, edited, printed, distributed and then gets consumed by the reader. On the other hand, theatrical production is immediate; an actor performs and the audience witnesses it instantaneously. Time, so to speak, does not separate the author, the text, and the reader anymore. The question of authorship gets further complicated in Augusto Boal’s “Theatre of the Oppressed” movement where the audience is a direct participant like the actors in the performance. In this research, through an experimental performance, the duality of theatre is explored with the authorship discourse. And the conventional definition of authorship is subjected to additional complexity by erasing the distinction between an actor and the audience. The design/methodology of the experimental performance is as follows: The audience will be asked to produce a text under an anonymous virtual alias. The text, as it is being produced, will be read and performed by the actor. The audience who are also collectively “authoring” the text, will watch this performance and write further until everyone has contributed with one input each. The cycle of writing, reading, performing, witnessing, and writing will continue until the end. The intention is to create a dynamic system of writing/reading with the embodiment of the text through the actor. The actor is giving up the power to the audience to write the spoken word, stage instruction and direction while still keeping the agency of interpreting that input and performing in the chosen manner. This rapid conversation between the actor and the audience also creates a conversion of authorship. The main conclusion of this study is a perspective on the nature of dynamic authorship of theatre containing a critical enquiry of the collaboratively produced text, an individually performed act, and a collectively witnessed event. Using practice as a methodology, this paper contests the poststructuralist notion of the author as merely a ‘scriptor’ and breaks it further by involving the audience in the authorship as well.

Keywords: practice based research, performance studies, post-humanism, Avant-garde art, theatre

Procedia PDF Downloads 70
3010 Automatic Classification Using Dynamic Fuzzy C Means Algorithm and Mathematical Morphology: Application in 3D MRI Image

Authors: Abdelkhalek Bakkari

Abstract:

Image segmentation is a critical step in image processing and pattern recognition. In this paper, we proposed a new robust automatic image classification based on a dynamic fuzzy c-means algorithm and mathematical morphology. The proposed segmentation algorithm (DFCM_MM) has been applied to MR perfusion images. The obtained results show the validity and robustness of the proposed approach.

Keywords: segmentation, classification, dynamic, fuzzy c-means, MR image

Procedia PDF Downloads 443
3009 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 142
3008 Classification of Construction Projects

Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside

Abstract:

To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.

Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment

Procedia PDF Downloads 650
3007 Text Mining Techniques for Prioritizing Pathogenic Mutations in Protein Families Known to Misfold or Aggregate

Authors: Khaleel Saleh Al-Rababah

Abstract:

Amyloid fibril forming regions, which are known as protein aggregates, in sequences of some protein families are associated with a number of diseases known as amyloidosis. Mutations play a role in forming fibrils by accelerating the fibril formation process. In this paper we want to extract diseases that caused by those mutations as a result of the impact of the mutations on structural and functional properties of the aggregated protein. We propose a text mining system, to automatically extract mutations, diseases and relations between mutations and diseases. We presented an algorithm based on finite state to cluster mutations found in the same sentence as a sentence could contain different mutation cause different diseases. Also, we presented a co reference algorithm that enables cross-link sentences.

Keywords: amyloid, amyloidosis, co reference, protein, text mining

Procedia PDF Downloads 497
3006 The Application of Lesson Study Model in Writing Review Text in Junior High School

Authors: Sulastriningsih Djumingin

Abstract:

This study has some objectives. It aims at describing the ability of the second-grade students to write review text without applying the Lesson Study model at SMPN 18 Makassar. Second, it seeks to describe the ability of the second-grade students to write review text by applying the Lesson Study model at SMPN 18 Makassar. Third, it aims at testing the effectiveness of the Lesson Study model in writing review text at SMPN 18 Makassar. This research was true experimental design with posttest Only group design involving two groups consisting of one class of the control group and one class of the experimental group. The research populations were all the second-grade students at SMPN 18 Makassar amounted to 250 students consisting of 8 classes. The sampling technique was purposive sampling technique. The control class was VIII2 consisting of 30 students, while the experimental class was VIII8 consisting of 30 students. The research instruments were in the form of observation and tests. The collected data were analyzed using descriptive statistical techniques and inferential statistical techniques with t-test types processed using SPSS 21 for windows. The results shows that: (1) of 30 students in control class, there are only 14 (47%) students who get the score more than 7.5, categorized as inadequate; (2) in the experimental class, there are 26 (87%) students who obtain the score of 7.5, categorized as adequate; (3) the Lesson Study models is effective to be applied in writing review text. Based on the comparison of the ability of the control class and experimental class, it indicates that the value of t-count is greater than the value of t-table (2.411> 1.667). It means that the alternative hypothesis (H1) proposed by the researcher is accepted.

Keywords: application, lesson study, review text, writing

Procedia PDF Downloads 176
3005 New Approach to Construct Phylogenetic Tree

Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui

Abstract:

Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.

Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis

Procedia PDF Downloads 474
3004 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: power spectral density, 3D EEG model, brain balancing, kNN

Procedia PDF Downloads 452
3003 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis

Authors: Adrian-Gabriel Chifu, Sebastien Fournier

Abstract:

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

Keywords: sentiment analysis, difficulty, classification, machine learning

Procedia PDF Downloads 44
3002 Developed Text-Independent Speaker Verification System

Authors: Mohammed Arif, Abdessalam Kifouche

Abstract:

Speech is a very convenient way of communication between people and machines. It conveys information about the identity of the talker. Since speaker recognition technology is increasingly securing our everyday lives, the objective of this paper is to develop two automatic text-independent speaker verification systems (TI SV) using low-level spectral features and machine learning methods. (i) The first system is based on a support vector machine (SVM), which was widely used in voice signal processing with the aim of speaker recognition involving verifying the identity of the speaker based on its voice characteristics, and (ii) the second is based on Gaussian Mixture Model (GMM) and Universal Background Model (UBM) to combine different functions from different resources to implement the SVM based.

Keywords: speaker verification, text-independent, support vector machine, Gaussian mixture model, cepstral analysis

Procedia PDF Downloads 18
3001 Evolutionary Methods in Cryptography

Authors: Wafa Slaibi Alsharafat

Abstract:

Genetic algorithms (GA) are random algorithms as random numbers that are generated during the operation of the algorithm determine what happens. This means that if GA is applied twice to optimize exactly the same problem it might produces two different answers. In this project, we propose an evolutionary algorithm and Genetic Algorithm (GA) to be implemented in symmetric encryption and decryption. Here, user's message and user secret information (key) which represent plain text to be transferred into cipher text.

Keywords: GA, encryption, decryption, crossover

Procedia PDF Downloads 414
3000 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 117
2999 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction

Procedia PDF Downloads 141
2998 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 313
2997 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 29
2996 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
2995 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 359
2994 Feature Extraction and Classification Based on the Bayes Test for Minimum Error

Authors: Nasar Aldian Ambark Shashoa

Abstract:

Classification with a dimension reduction based on Bayesian approach is proposed in this paper . The first step is to generate a sample (parameter) of fault-free mode class and faulty mode class. The second, in order to obtain good classification performance, a selection of important features is done with the discrete karhunen-loeve expansion. Next, the Bayes test for minimum error is used to classify the classes. Finally, the results for simulated data demonstrate the capabilities of the proposed procedure.

Keywords: analytical redundancy, fault detection, feature extraction, Bayesian approach

Procedia PDF Downloads 501
2993 Network Traffic Classification Scheme for Internet Network Based on Application Categorization for Ipv6

Authors: Yaser Miaji, Mohammed Aloryani

Abstract:

The rise of recent applications in everyday implementation like videoconferencing, online recreation and voice speech communication leads to pressing the need for novel mechanism and policy to serve this steep improvement within the application itself and users‟ wants. This diversity in web traffics needs some classification and prioritization of the traffics since some traffics merit abundant attention with less delay and loss, than others. This research is intended to reinforce the mechanism by analysing the performance in application according to the proposed mechanism implemented. The mechanism used is quite direct and analytical. The mechanism is implemented by modifying the queue limit in the algorithm.

Keywords: traffic classification, IPv6, internet, application categorization

Procedia PDF Downloads 534
2992 Moral Wrongdoers: Evaluating the Value of Moral Actions Performed by War Criminals

Authors: Jean-Francois Caron

Abstract:

This text explores the value of moral acts performed by war criminals, and the extent to which they should alleviate the punishment these individuals ought to receive for violating the rules of war. Without neglecting the necessity of retribution in war crimes cases, it argues from an ethical perspective that we should not rule out the possibility of considering lesser punishments for war criminals who decide to perform a moral act, as it might produce significant positive moral outcomes. This text also analyzes how such a norm could be justified from a moral perspective.

Keywords: war criminals, pardon, amnesty, retribution

Procedia PDF Downloads 250
2991 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 151
2990 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 125
2989 Contextual Toxicity Detection with Data Augmentation

Authors: Julia Ive, Lucia Specia

Abstract:

Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.

Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing

Procedia PDF Downloads 137