Search results for: n-grams classification
2078 Experimental Study of Hyperparameter Tuning a Deep Learning Convolutional Recurrent Network for Text Classification
Authors: Bharatendra Rai
Abstract:
The sequence of words in text data has long-term dependencies and is known to suffer from vanishing gradient problems when developing deep learning models. Although recurrent networks such as long short-term memory networks help to overcome this problem, achieving high text classification performance is a challenging problem. Convolutional recurrent networks that combine the advantages of long short-term memory networks and convolutional neural networks can be useful for text classification performance improvements. However, arriving at suitable hyperparameter values for convolutional recurrent networks is still a challenging task where fitting a model requires significant computing resources. This paper illustrates the advantages of using convolutional recurrent networks for text classification with the help of statistically planned computer experiments for hyperparameter tuning.Keywords: long short-term memory networks, convolutional recurrent networks, text classification, hyperparameter tuning, Tukey honest significant differences
Procedia PDF Downloads 1282077 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG
Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat
Abstract:
Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy
Procedia PDF Downloads 5192076 3D Receiver Operator Characteristic Histogram
Authors: Xiaoli Zhang, Xiongfei Li, Yuncong Feng
Abstract:
ROC curves, as a widely used evaluating tool in machine learning field, are the tradeoff of true positive rate and negative rate. However, they are blamed for ignoring some vital information in the evaluation process, such as the amount of information about the target that each instance carries, predicted score given by each classification model to each instance. Hence, in this paper, a new classification performance method is proposed by extending the Receiver Operator Characteristic (ROC) curves to 3D space, which is denoted as 3D ROC Histogram. In the histogram, theKeywords: classification, performance evaluation, receiver operating characteristic histogram, hardness prediction
Procedia PDF Downloads 3122075 Combined Odd Pair Autoregressive Coefficients for Epileptic EEG Signals Classification by Radial Basis Function Neural Network
Authors: Boukari Nassim
Abstract:
This paper describes the use of odd pair autoregressive coefficients (Yule _Walker and Burg) for the feature extraction of electroencephalogram (EEG) signals. In the classification: the radial basis function neural network neural network (RBFNN) is employed. The RBFNN is described by his architecture and his characteristics: as the RBF is defined by the spread which is modified for improving the results of the classification. Five types of EEG signals are defined for this work: Set A, Set B for normal signals, Set C, Set D for interictal signals, set E for ictal signal (we can found that in Bonn university). In outputs, two classes are given (AC, AD, AE, BC, BD, BE, CE, DE), the best accuracy is calculated at 99% for the combined odd pair autoregressive coefficients. Our method is very effective for the diagnosis of epileptic EEG signals.Keywords: epilepsy, EEG signals classification, combined odd pair autoregressive coefficients, radial basis function neural network
Procedia PDF Downloads 3432074 Use of Segmentation and Color Adjustment for Skin Tone Classification in Dermatological Images
Authors: Fernando Duarte
Abstract:
The work aims to evaluate the use of classical image processing methodologies towards skin tone classification in dermatological images. The skin tone is an important attribute when considering several factor for skin cancer diagnosis. Currently, there is a lack of clear methodologies to classify the skin tone based only on the dermatological image. In this work, a recent released dataset with the label for skin tone was used as reference for the evaluation of classical methodologies for segmentation and adjustment of color space for classification of skin tone in dermatological images. It was noticed that even though the classical methodologies can work fine for segmentation and color adjustment, classifying the skin tone without proper control of the aquisition of the sample images ended being very unreliable.Keywords: segmentation, classification, color space, skin tone, Fitzpatrick
Procedia PDF Downloads 342073 Automatic Classification Using Dynamic Fuzzy C Means Algorithm and Mathematical Morphology: Application in 3D MRI Image
Authors: Abdelkhalek Bakkari
Abstract:
Image segmentation is a critical step in image processing and pattern recognition. In this paper, we proposed a new robust automatic image classification based on a dynamic fuzzy c-means algorithm and mathematical morphology. The proposed segmentation algorithm (DFCM_MM) has been applied to MR perfusion images. The obtained results show the validity and robustness of the proposed approach.Keywords: segmentation, classification, dynamic, fuzzy c-means, MR image
Procedia PDF Downloads 4752072 Classification of Construction Projects
Authors: M. Safa, A. Sabet, S. MacGillivray, M. Davidson, K. Kaczmarczyk, C. T. Haas, G. E. Gibson, D. Rayside
Abstract:
To address construction project requirements and specifications, scholars and practitioners need to establish a taxonomy according to a scheme that best fits their need. While existing characterization methods are continuously being improved, new ones are devised to cover project properties which have not been previously addressed. One such method, the Project Definition Rating Index (PDRI), has received limited consideration strictly as a classification scheme. Developed by the Construction Industry Institute (CII) in 1996, the PDRI has been refined over the last two decades as a method for evaluating a project's scope definition completeness during front-end planning (FEP). The main contribution of this study is a review of practical project classification methods, and a discussion of how PDRI can be used to classify projects based on their readiness in the FEP phase. The proposed model has been applied to 59 construction projects in Ontario, and the results are discussed.Keywords: project classification, project definition rating index (PDRI), risk, project goals alignment
Procedia PDF Downloads 6762071 New Approach to Construct Phylogenetic Tree
Authors: Ouafae Baida, Najma Hamzaoui, Maha Akbib, Abdelfettah Sedqui, Abdelouahid Lyhyaoui
Abstract:
Numerous scientific works present various methods to analyze the data for several domains, specially the comparison of classifications. In our recent work, we presented a new approach to help the user choose the best classification method from the results obtained by every method, by basing itself on the distances between the trees of classification. The result of our approach was in the form of a dendrogram contains methods as a succession of connections. This approach is much needed in phylogeny analysis. This discipline is intended to analyze the sequences of biological macro molecules for information on the evolutionary history of living beings, including their relationship. The product of phylogeny analysis is a phylogenetic tree. In this paper, we recommend the use of a new method of construction the phylogenetic tree based on comparison of different classifications obtained by different molecular genes.Keywords: hierarchical classification, classification methods, structure of tree, genes, phylogenetic analysis
Procedia PDF Downloads 5082070 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique
Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan
Abstract:
In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.Keywords: power spectral density, 3D EEG model, brain balancing, kNN
Procedia PDF Downloads 4842069 Development of Fake News Model Using Machine Learning through Natural Language Processing
Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini
Abstract:
Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.Keywords: fake news detection, natural language processing, machine learning, classification techniques.
Procedia PDF Downloads 1652068 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting
Authors: Yiannis G. Smirlis
Abstract:
The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction
Procedia PDF Downloads 1632067 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction
Procedia PDF Downloads 3352066 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review
Authors: Faisal Muhibuddin, Ani Dijah Rahajoe
Abstract:
This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review
Procedia PDF Downloads 612065 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy
Authors: Kemal Polat
Abstract:
In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.Keywords: machine learning, data weighting, classification, data mining
Procedia PDF Downloads 3242064 Feature Extraction and Classification Based on the Bayes Test for Minimum Error
Authors: Nasar Aldian Ambark Shashoa
Abstract:
Classification with a dimension reduction based on Bayesian approach is proposed in this paper . The first step is to generate a sample (parameter) of fault-free mode class and faulty mode class. The second, in order to obtain good classification performance, a selection of important features is done with the discrete karhunen-loeve expansion. Next, the Bayes test for minimum error is used to classify the classes. Finally, the results for simulated data demonstrate the capabilities of the proposed procedure.Keywords: analytical redundancy, fault detection, feature extraction, Bayesian approach
Procedia PDF Downloads 5252063 Network Traffic Classification Scheme for Internet Network Based on Application Categorization for Ipv6
Authors: Yaser Miaji, Mohammed Aloryani
Abstract:
The rise of recent applications in everyday implementation like videoconferencing, online recreation and voice speech communication leads to pressing the need for novel mechanism and policy to serve this steep improvement within the application itself and users‟ wants. This diversity in web traffics needs some classification and prioritization of the traffics since some traffics merit abundant attention with less delay and loss, than others. This research is intended to reinforce the mechanism by analysing the performance in application according to the proposed mechanism implemented. The mechanism used is quite direct and analytical. The mechanism is implemented by modifying the queue limit in the algorithm.Keywords: traffic classification, IPv6, internet, application categorization
Procedia PDF Downloads 5632062 A Lightweight Pretrained Encrypted Traffic Classification Method with Squeeze-and-Excitation Block and Sharpness-Aware Optimization
Authors: Zhiyan Meng, Dan Liu, Jintao Meng
Abstract:
Dependable encrypted traffic classification is crucial for improving cybersecurity and handling the growing amount of data. Large language models have shown that learning from large datasets can be effective, making pre-trained methods for encrypted traffic classification popular. However, attention-based pre-trained methods face two main issues: their large neural parameters are not suitable for low-computation environments like mobile devices and real-time applications, and they often overfit by getting stuck in local minima. To address these issues, we developed a lightweight transformer model, which reduces the computational parameters through lightweight vocabulary construction and Squeeze-and-Excitation Block. We use sharpness-aware optimization to avoid local minima during pre-training and capture temporal features with relative positional embeddings. Our approach keeps the model's classification accuracy high for downstream tasks. We conducted experiments on four datasets -USTC-TFC2016, VPN 2016, Tor 2016, and CICIOT 2022. Even with fewer than 18 million parameters, our method achieves classification results similar to methods with ten times as many parameters.Keywords: sharpness-aware optimization, encrypted traffic classification, squeeze-and-excitation block, pretrained model
Procedia PDF Downloads 282061 Comparison of the Classification of Cystic Renal Lesions Using the Bosniak Classification System with Contrast Enhanced Ultrasound and Magnetic Resonance Imaging to Computed Tomography: A Prospective Study
Authors: Dechen Tshering Vogel, Johannes T. Heverhagen, Bernard Kiss, Spyridon Arampatzis
Abstract:
In addition to computed tomography (CT), contrast enhanced ultrasound (CEUS), and magnetic resonance imaging (MRI) are being increasingly used for imaging of renal lesions. The aim of this prospective study was to compare the classification of complex cystic renal lesions using the Bosniak classification with CEUS and MRI to CT. Forty-eight patients with 65 cystic renal lesions were included in this study. All participants signed written informed consent. The agreement between the Bosniak classifications of complex renal lesions ( ≥ BII-F) on CEUS and MRI were compared to that of CT and were tested using Cohen’s Kappa. Sensitivity, specificity, positive and negative predictive values (PPV/NPV) and the accuracy of CEUS and MRI compared to CT in the detection of complex renal lesions were calculated. Twenty-nine (45%) out of 65 cystic renal lesions were classified as complex using CT. The agreement between CEUS and CT in the classification of complex cysts was fair (agreement 50.8%, Kappa 0.31), and was excellent between MRI and CT (agreement 93.9%, Kappa 0.88). Compared to CT, MRI had a sensitivity of 96.6%, specificity of 91.7%, a PPV of 54.7%, and an NPV of 54.7% with an accuracy of 63.1%. The corresponding values for CEUS were sensitivity 100.0%, specificity 33.3%, PPV 90.3%, and NPV 97.1% with an accuracy 93.8%. The classification of complex renal cysts based on MRI and CT scans correlated well, and MRI can be used instead of CT for this purpose. CEUS can exclude complex lesions, but due to higher sensitivity, cystic lesions tend to be upgraded. However, it is useful for initial imaging, for follow up of lesions and in those patients with contraindications to CT and MRI.Keywords: Bosniak classification, computed tomography, contrast enhanced ultrasound, cystic renal lesions, magnetic resonance imaging
Procedia PDF Downloads 1422060 Enhancement Method of Network Traffic Anomaly Detection Model Based on Adversarial Training With Category Tags
Authors: Zhang Shuqi, Liu Dan
Abstract:
For the problems in intelligent network anomaly traffic detection models, such as low detection accuracy caused by the lack of training samples, poor effect with small sample attack detection, a classification model enhancement method, F-ACGAN(Flow Auxiliary Classifier Generative Adversarial Network) which introduces generative adversarial network and adversarial training, is proposed to solve these problems. Generating adversarial data with category labels could enhance the training effect and improve classification accuracy and model robustness. FACGAN consists of three steps: feature preprocess, which includes data type conversion, dimensionality reduction and normalization, etc.; A generative adversarial network model with feature learning ability is designed, and the sample generation effect of the model is improved through adversarial iterations between generator and discriminator. The adversarial disturbance factor of the gradient direction of the classification model is added to improve the diversity and antagonism of generated data and to promote the model to learn from adversarial classification features. The experiment of constructing a classification model with the UNSW-NB15 dataset shows that with the enhancement of FACGAN on the basic model, the classification accuracy has improved by 8.09%, and the score of F1 has improved by 6.94%.Keywords: data imbalance, GAN, ACGAN, anomaly detection, adversarial training, data augmentation
Procedia PDF Downloads 1012059 International Classification of Primary Care as a Reference for Coding the Demand for Care in Primary Health Care
Authors: Souhir Chelly, Chahida Harizi, Aicha Hechaichi, Sihem Aissaoui, Leila Ben Ayed, Maha Bergaoui, Mohamed Kouni Chahed
Abstract:
Introduction: The International Classification of Primary Care (ICPC) is part of the morbidity classification system. It had 17 chapters, and each is coded by an alphanumeric code: the letter corresponds to the chapter, the number to a paragraph in the chapter. The objective of this study is to show the utility of this classification in the coding of the reasons for demand for care in Primary health care (PHC), its advantages and limits. Methods: This is a cross-sectional descriptive study conducted in 4 PHC in Ariana district. Data on the demand for care during 2 days in the same week were collected. The coding of the information was done according to the CISP. The data was entered and analyzed by the EPI Info 7 software. Results: A total of 523 demands for care were investigated. The patients who came for the consultation are predominantly female (62.72%). Most of the consultants are young with an average age of 35 ± 26 years. In the ICPC, there are 7 rubrics: 'infections' is the most common reason with 49.9%, 'other diagnoses' with 40.2%, 'symptoms and complaints' with 5.5%, 'trauma' with 2.1%, 'procedures' with 2.1% and 'neoplasm' with 0.3%. The main advantage of the ICPC is the fact of being a standardized tool. It is very suitable for classification of the reasons for demand for care in PHC according to their specificity, capacity to be used in a computerized medical file of the PHC. Its current limitations are related to the difficulty of classification of some reasons for demand for care. Conclusion: The ICPC has been developed to provide healthcare with a coding reference that takes into account their specificity. The CIM is in its 10th revision; it would gain from revision to revision to be more efficient to be generalized and used by the teams of PHC.Keywords: international classification of primary care, medical file, primary health care, Tunisia
Procedia PDF Downloads 2642058 A Quantitative Evaluation of Text Feature Selection Methods
Authors: B. S. Harish, M. B. Revanasiddappa
Abstract:
Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.Keywords: classifiers, feature selection, text classification
Procedia PDF Downloads 4572057 Evaluation and Fault Classification for Healthcare Robot during Sit-To-Stand Performance through Center of Pressure
Authors: Tianyi Wang, Hieyong Jeong, An Guo, Yuko Ohno
Abstract:
Healthcare robot for assisting sit-to-stand (STS) performance had aroused numerous research interests. To author’s best knowledge, knowledge about how evaluating healthcare robot is still unknown. Robot should be labeled as fault if users feel demanding during STS when they are assisted by robot. In this research, we aim to propose a method to evaluate sit-to-stand assist robot through center of pressure (CoP), then classify different STS performance. Experiments were executed five times with ten healthy subjects under four conditions: two self-performed STSs with chair heights of 62 cm and 43 cm, and two robot-assisted STSs with chair heights of 43 cm and robot end-effect speed of 2 s and 5 s. CoP was measured using a Wii Balance Board (WBB). Bayesian classification was utilized to classify STS performance. The results showed that faults occurred when decreased the chair height and slowed robot assist speed. Proposed method for fault classification showed high probability of classifying fault classes form others. It was concluded that faults for STS assist robot could be detected by inspecting center of pressure and be classified through proposed classification algorithm.Keywords: center of pressure, fault classification, healthcare robot, sit-to-stand movement
Procedia PDF Downloads 1952056 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images
Authors: Jameela Ali Alkrimi, Abdul Rahim Ahmad, Azizah Suliman, Loay E. George
Abstract:
Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. The lack of RBCs is a condition characterized by lower than normal hemoglobin level; this condition is referred to as 'anemia'. In this study, a software was developed to isolate RBCs by using a machine learning approach to classify anemic RBCs in microscopic images. Several features of RBCs were extracted using image processing algorithms, including principal component analysis (PCA). With the proposed method, RBCs were isolated in 34 second from an image containing 18 to 27 cells. We also proposed that PCA could be performed to increase the speed and efficiency of classification. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network ANN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained for a short time period with more efficient when PCA was used.Keywords: red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC
Procedia PDF Downloads 4012055 An Attempt at the Multi-Criterion Classification of Small Towns
Authors: Jerzy Banski
Abstract:
The basic aim of this study is to discuss and assess different classifications and research approaches to small towns that take their social and economic functions into account, as well as relations with surrounding areas. The subject literature typically includes three types of approaches to the classification of small towns: 1) the structural, 2) the location-related, and 3) the mixed. The structural approach allows for the grouping of towns from the point of view of the social, cultural and economic functions they discharge. The location-related approach draws on the idea of there being a continuum between the center and the periphery. A mixed classification making simultaneous use of the different approaches to research brings the most information to bear in regard to categories of the urban locality. Bearing in mind the approaches to classification, it is possible to propose a synthetic method for classifying small towns that takes account of economic structure, location and the relationship between the towns and their surroundings. In the case of economic structure, the small centers may be divided into two basic groups – those featuring a multi-branch structure and those that are specialized economically. A second element of the classification reflects the locations of urban centers. Two basic types can be identified – the small town within the range of impact of a large agglomeration, or else the town outside such areas, which is to say located peripherally. The third component of the classification arises out of small towns’ relations with their surroundings. In consequence, it is possible to indicate 8 types of small-town: from local centers enjoying good accessibility and a multi-branch economic structure to peripheral supra-local centers characterised by a specialized economic structure.Keywords: small towns, classification, functional structure, localization
Procedia PDF Downloads 1792054 Multi-Class Text Classification Using Ensembles of Classifiers
Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari
Abstract:
Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost
Procedia PDF Downloads 2292053 Determination of the Bank's Customer Risk Profile: Data Mining Applications
Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge
Abstract:
In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.Keywords: client classification, loan suitability, risk rating, CART analysis
Procedia PDF Downloads 3372052 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children
Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco
Abstract:
Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.Keywords: evolutionary computation, feature selection, classification, clustering
Procedia PDF Downloads 3692051 Mood Recognition Using Indian Music
Authors: Vishwa Joshi
Abstract:
The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.Keywords: music, mood, features, classification
Procedia PDF Downloads 4932050 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System
Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García
Abstract:
In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning
Procedia PDF Downloads 4702049 Air Classification of Dust from Steel Converter Secondary De-dusting for Zinc Enrichment
Authors: C. Lanzerstorfer
Abstract:
The off-gas from the basic oxygen furnace (BOF), where pig iron is converted into steel, is treated in the primary ventilation system. This system is in full operation only during oxygen-blowing when the BOF converter vessel is in a vertical position. When pig iron and scrap are charged into the BOF and when slag or steel are tapped, the vessel is tilted. The generated emissions during charging and tapping cannot be captured by the primary off-gas system. To capture these emissions, a secondary ventilation system is usually installed. The emissions are captured by a canopy hood installed just above the converter mouth in tilted position. The aim of this study was to investigate the dependence of Zn and other components on the particle size of BOF secondary ventilation dust. Because of the high temperature of the BOF process it can be expected that Zn will be enriched in the fine dust fractions. If Zn is enriched in the fine fractions, classification could be applied to split the dust into two size fractions with a different content of Zn. For this air classification experiments with dust from the secondary ventilation system of a BOF were performed. The results show that Zn and Pb are highly enriched in the finest dust fraction. For Cd, Cu and Sb the enrichment is less. In contrast, the non-volatile metals Al, Fe, Mn and Ti were depleted in the fine fractions. Thus, air classification could be considered for the treatment of dust from secondary BOF off-gas cleaning.Keywords: air classification, converter dust, recycling, zinc
Procedia PDF Downloads 424