Search results for: supervised classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2252

Search results for: supervised classification

2042 An Attempt at the Multi-Criterion Classification of Small Towns

Authors: Jerzy Banski

Abstract:

The basic aim of this study is to discuss and assess different classifications and research approaches to small towns that take their social and economic functions into account, as well as relations with surrounding areas. The subject literature typically includes three types of approaches to the classification of small towns: 1) the structural, 2) the location-related, and 3) the mixed. The structural approach allows for the grouping of towns from the point of view of the social, cultural and economic functions they discharge. The location-related approach draws on the idea of there being a continuum between the center and the periphery. A mixed classification making simultaneous use of the different approaches to research brings the most information to bear in regard to categories of the urban locality. Bearing in mind the approaches to classification, it is possible to propose a synthetic method for classifying small towns that takes account of economic structure, location and the relationship between the towns and their surroundings. In the case of economic structure, the small centers may be divided into two basic groups – those featuring a multi-branch structure and those that are specialized economically. A second element of the classification reflects the locations of urban centers. Two basic types can be identified – the small town within the range of impact of a large agglomeration, or else the town outside such areas, which is to say located peripherally. The third component of the classification arises out of small towns’ relations with their surroundings. In consequence, it is possible to indicate 8 types of small-town: from local centers enjoying good accessibility and a multi-branch economic structure to peripheral supra-local centers characterised by a specialized economic structure.

Keywords: small towns, classification, functional structure, localization

Procedia PDF Downloads 159
2041 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 204
2040 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: client classification, loan suitability, risk rating, CART analysis

Procedia PDF Downloads 317
2039 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 339
2038 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 473
2037 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning

Procedia PDF Downloads 434
2036 The Application of Video Segmentation Methods for the Purpose of Action Detection in Videos

Authors: Nassima Noufail, Sara Bouhali

Abstract:

In this work, we develop a semi-supervised solution for the purpose of action detection in videos and propose an efficient algorithm for video segmentation. The approach is divided into video segmentation, feature extraction, and classification. In the first part, a video is segmented into clips, and we used the K-means algorithm for this segmentation; our goal is to find groups based on similarity in the video. The application of k-means clustering into all the frames is time-consuming; therefore, we started by the identification of transition frames where the scene in the video changes significantly, and then we applied K-means clustering into these transition frames. We used two image filters, the gaussian filter and the Laplacian of Gaussian. Each filter extracts a set of features from the frames. The Gaussian filter blurs the image and omits the higher frequencies, and the Laplacian of gaussian detects regions of rapid intensity changes; we then used this vector of filter responses as an input to our k-means algorithm. The output is a set of cluster centers. Each video frame pixel is then mapped to the nearest cluster center and painted with a corresponding color to form a visual map. The resulting visual map had similar pixels grouped. We then computed a cluster score indicating how clusters are near each other and plotted a signal representing frame number vs. clustering score. Our hypothesis was that the evolution of the signal would not change if semantically related events were happening in the scene. We marked the breakpoints at which the root mean square level of the signal changes significantly, and each breakpoint is an indication of the beginning of a new video segment. In the second part, for each segment from part 1, we randomly selected a 16-frame clip, then we extracted spatiotemporal features using convolutional 3D network C3D for every 16 frames using a pre-trained model. The C3D final output is a 512-feature vector dimension; hence we used principal component analysis (PCA) for dimensionality reduction. The final part is the classification. The C3D feature vectors are used as input to a multi-class linear support vector machine (SVM) for the training model, and we used a multi-classifier to detect the action. We evaluated our experiment on the UCF101 dataset, which consists of 101 human action categories, and we achieved an accuracy that outperforms the state of art by 1.2%.

Keywords: video segmentation, action detection, classification, Kmeans, C3D

Procedia PDF Downloads 49
2035 Air Classification of Dust from Steel Converter Secondary De-dusting for Zinc Enrichment

Authors: C. Lanzerstorfer

Abstract:

The off-gas from the basic oxygen furnace (BOF), where pig iron is converted into steel, is treated in the primary ventilation system. This system is in full operation only during oxygen-blowing when the BOF converter vessel is in a vertical position. When pig iron and scrap are charged into the BOF and when slag or steel are tapped, the vessel is tilted. The generated emissions during charging and tapping cannot be captured by the primary off-gas system. To capture these emissions, a secondary ventilation system is usually installed. The emissions are captured by a canopy hood installed just above the converter mouth in tilted position. The aim of this study was to investigate the dependence of Zn and other components on the particle size of BOF secondary ventilation dust. Because of the high temperature of the BOF process it can be expected that Zn will be enriched in the fine dust fractions. If Zn is enriched in the fine fractions, classification could be applied to split the dust into two size fractions with a different content of Zn. For this air classification experiments with dust from the secondary ventilation system of a BOF were performed. The results show that Zn and Pb are highly enriched in the finest dust fraction. For Cd, Cu and Sb the enrichment is less. In contrast, the non-volatile metals Al, Fe, Mn and Ti were depleted in the fine fractions. Thus, air classification could be considered for the treatment of dust from secondary BOF off-gas cleaning.

Keywords: air classification, converter dust, recycling, zinc

Procedia PDF Downloads 405
2034 3D Reconstruction of Human Body Based on Gender Classification

Authors: Jiahe Liu, Hongyang Yu, Feng Qian, Miao Luo

Abstract:

SMPL-X was a powerful parametric human body model that included male, neutral, and female models, with significant gender differences between these three models. During the process of 3D human body reconstruction, the correct selection of standard templates was crucial for obtaining accurate results. To address this issue, we developed an efficient gender classification algorithm to automatically select the appropriate template for 3D human body reconstruction. The key to this gender classification algorithm was the precise analysis of human body features. By using the SMPL-X model, the algorithm could detect and identify gender features of the human body, thereby determining which standard template should be used. The accuracy of this algorithm made the 3D reconstruction process more accurate and reliable, as it could adjust model parameters based on individual gender differences. SMPL-X and the related gender classification algorithm have brought important advancements to the field of 3D human body reconstruction. By accurately selecting standard templates, they have improved the accuracy of reconstruction and have broad potential in various application fields. These technologies continue to drive the development of the 3D reconstruction field, providing us with more realistic and accurate human body models.

Keywords: gender classification, joint detection, SMPL-X, 3D reconstruction

Procedia PDF Downloads 36
2033 Satellite Imagery Classification Based on Deep Convolution Network

Authors: Zhong Ma, Zhuping Wang, Congxin Liu, Xiangzeng Liu

Abstract:

Satellite imagery classification is a challenging problem with many practical applications. In this paper, we designed a deep convolution neural network (DCNN) to classify the satellite imagery. The contributions of this paper are twofold — First, to cope with the large-scale variance in the satellite image, we introduced the inception module, which has multiple filters with different size at the same level, as the building block to build our DCNN model. Second, we proposed a genetic algorithm based method to efficiently search the best hyper-parameters of the DCNN in a large search space. The proposed method is evaluated on the benchmark database. The results of the proposed hyper-parameters search method show it will guide the search towards better regions of the parameter space. Based on the found hyper-parameters, we built our DCNN models, and evaluated its performance on satellite imagery classification, the results show the classification accuracy of proposed models outperform the state of the art method.

Keywords: satellite imagery classification, deep convolution network, genetic algorithm, hyper-parameter optimization

Procedia PDF Downloads 270
2032 The Role of Inventory Classification in Supply Chain Responsiveness in a Build-to-Order and Build-To-Forecast Manufacturing Environment: A Comparative Analysis

Authors: Qamar Iqbal

Abstract:

Companies strive to improve their forecasting methods to predict the fluctuations in customer demand. These fluctuation and variation in demand affect the manufacturing operations and can limit a company’s ability to fulfill customer demand on time. Companies keep the inventory buffer and maintain the stocking levels to reduce the impact of demand variation. A mid-size company deals with thousands of stock keeping units (skus). It is neither easy and nor efficient to control and manage each sku. Inventory classification provides a tool to the management to increase their ability to support customer demand. The paper presents a framework that shows how inventory classification can play a role to increase supply chain responsiveness. A case study will be presented to further elaborate the method both for build-to-order and build-to-forecast manufacturing environments. Results will be compared that will show which manufacturing setting has advantage over another under different circumstances. The outcome of this study is very useful to the management because this will give them an insight on how inventory classification can be used to increase their ability to respond to changing customer needs.

Keywords: inventory classification, supply chain responsiveness, forecast, manufacturing environment

Procedia PDF Downloads 570
2031 On the Cyclic Property of Groups of Prime Order

Authors: Ying Yi Wu

Abstract:

The study of finite groups is a central topic in algebraic structures, and one of the most fundamental questions in this field is the classification of finite groups up to isomorphism. In this paper, we investigate the cyclic property of groups of prime order, which is a crucial result in the classification of finite abelian groups. We prove the following statement: If p is a prime, then every group G of order p is cyclic. Our proof utilizes the properties of group actions and the class equation, which provide a powerful tool for studying the structure of finite groups. In particular, we first show that any non-identity element of G generates a cyclic subgroup of G. Then, we establish the existence of an element of order p, which implies that G is generated by a single element. Finally, we demonstrate that any two generators of G are conjugate, which shows that G is a cyclic group. Our result has significant implications in the classification of finite groups, as it implies that any group of prime order is isomorphic to the cyclic group of the same order. Moreover, it provides a useful tool for understanding the structure of more complicated finite groups, as any finite abelian group can be decomposed into a direct product of cyclic groups. Our proof technique can also be extended to other areas of group theory, such as the classification of finite p-groups, where p is a prime. Therefore, our work has implications beyond the specific result we prove and can contribute to further research in algebraic structures.

Keywords: group theory, finite groups, cyclic groups, prime order, classification.

Procedia PDF Downloads 52
2030 Sentiment Analysis on the East Timor Accession Process to the ASEAN

Authors: Marcelino Caetano Noronha, Vosco Pereira, Jose Soares Pinto, Ferdinando Da C. Saores

Abstract:

One particularly popular social media platform is Youtube. It’s a video-sharing platform where users can submit videos, and other users can like, dislike or comment on the videos. In this study, we conduct a binary classification task on YouTube’s video comments and review from the users regarding the accession process of Timor Leste to become the eleventh member of the Association of South East Asian Nations (ASEAN). We scrape the data directly from the public YouTube video and apply several pre-processing and weighting techniques. Before conducting the classification, we categorized the data into two classes, namely positive and negative. In the classification part, we apply Support Vector Machine (SVM) algorithm. By comparing with Naïve Bayes Algorithm, the experiment showed SVM achieved 84.1% of Accuracy, 94.5% of Precision, and Recall 73.8% simultaneously.

Keywords: classification, YouTube, sentiment analysis, support sector machine

Procedia PDF Downloads 67
2029 On the Network Packet Loss Tolerance of SVM Based Activity Recognition

Authors: Gamze Uslu, Sebnem Baydere, Alper K. Demir

Abstract:

In this study, data loss tolerance of Support Vector Machines (SVM) based activity recognition model and multi activity classification performance when data are received over a lossy wireless sensor network is examined. Initially, the classification algorithm we use is evaluated in terms of resilience to random data loss with 3D acceleration sensor data for sitting, lying, walking and standing actions. The results show that the proposed classification method can recognize these activities successfully despite high data loss. Secondly, the effect of differentiated quality of service performance on activity recognition success is measured with activity data acquired from a multi hop wireless sensor network, which introduces high data loss. The effect of number of nodes on the reliability and multi activity classification success is demonstrated in simulation environment. To the best of our knowledge, the effect of data loss in a wireless sensor network on activity detection success rate of an SVM based classification algorithm has not been studied before.

Keywords: activity recognition, support vector machines, acceleration sensor, wireless sensor networks, packet loss

Procedia PDF Downloads 443
2028 Geoinformation Technology of Agricultural Monitoring Using Multi-Temporal Satellite Imagery

Authors: Olena Kavats, Dmitry Khramov, Kateryna Sergieieva, Vladimir Vasyliev, Iurii Kavats

Abstract:

Geoinformation technologies of space agromonitoring are a means of operative decision making support in the tasks of managing the agricultural sector of the economy. Existing technologies use satellite images in the optical range of electromagnetic spectrum. Time series of optical images often contain gaps due to the presence of clouds and haze. A geoinformation technology is created. It allows to fill gaps in time series of optical images (Sentinel-2, Landsat-8, PROBA-V, MODIS) with radar survey data (Sentinel-1) and use information about agrometeorological conditions of the growing season for individual monitoring years. The technology allows to perform crop classification and mapping for spring-summer (winter and spring crops) and autumn-winter (winter crops) periods of vegetation, monitoring the dynamics of crop state seasonal changes, crop yield forecasting. Crop classification is based on supervised classification algorithms, takes into account the peculiarities of crop growth at different vegetation stages (dates of sowing, emergence, active vegetation, and harvesting) and agriculture land state characteristics (row spacing, seedling density, etc.). A catalog of samples of the main agricultural crops (Ukraine) is created and crop spectral signatures are calculated with the preliminary removal of row spacing, cloud cover, and cloud shadows in order to construct time series of crop growth characteristics. The obtained data is used in grain crop growth tracking and in timely detection of growth trends deviations from reference samples of a given crop for a selected date. Statistical models of crop yield forecast are created in the forms of linear and nonlinear interconnections between crop yield indicators and crop state characteristics (temperature, precipitation, vegetation indices, etc.). Predicted values of grain crop yield are evaluated with an accuracy up to 95%. The developed technology was used for agricultural areas monitoring in a number of Great Britain and Ukraine regions using EOS Crop Monitoring Platform (https://crop-monitoring.eos.com). The obtained results allow to conclude that joint use of Sentinel-1 and Sentinel-2 images improve separation of winter crops (rapeseed, wheat, barley) in the early stages of vegetation (October-December). It allows to separate successfully the soybean, corn, and sunflower sowing areas that are quite similar in their spectral characteristics.

Keywords: geoinformation technology, crop classification, crop yield prediction, agricultural monitoring, EOS Crop Monitoring Platform

Procedia PDF Downloads 402
2027 Automated Detection of Women Dehumanization in English Text

Authors: Maha Wiss, Wael Khreich

Abstract:

Animals, objects, foods, plants, and other non-human terms are commonly used as a source of metaphors to describe females in formal and slang language. Comparing women to non-human items not only reflects cultural views that might conceptualize women as subordinates or in a lower position than humans, yet it conveys this degradation to the listeners. Moreover, the dehumanizing representation of females in the language normalizes the derogation and even encourages sexism and aggressiveness against women. Although dehumanization has been a popular research topic for decades, according to our knowledge, no studies have linked women's dehumanizing language to the machine learning field. Therefore, we introduce our research work as one of the first attempts to create a tool for the automated detection of the dehumanizing depiction of females in English texts. We also present the first labeled dataset on the charted topic, which is used for training supervised machine learning algorithms to build an accurate classification model. The importance of this work is that it accomplishes the first step toward mitigating dehumanizing language against females.

Keywords: gender bias, machine learning, NLP, women dehumanization

Procedia PDF Downloads 55
2026 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme gradient boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impair, multiclass classification, ADNI, support vector machine, random forest

Procedia PDF Downloads 158
2025 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 40
2024 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis

Procedia PDF Downloads 677
2023 A Study on the Performance of 2-PC-D Classification Model

Authors: Nurul Aini Abdul Wahab, Nor Syamim Halidin, Sayidatina Aisah Masnan, Nur Izzati Romli

Abstract:

There are many applications of principle component method for reducing the large set of variables in various fields. Fisher’s Discriminant function is also a popular tool for classification. In this research, the researcher focuses on studying the performance of Principle Component-Fisher’s Discriminant function in helping to classify rice kernels to their defined classes. The data were collected on the smells or odour of the rice kernel using odour-detection sensor, Cyranose. 32 variables were captured by this electronic nose (e-nose). The objective of this research is to measure how well a combination model, between principle component and linear discriminant, to be as a classification model. Principle component method was used to reduce all 32 variables to a smaller and manageable set of components. Then, the reduced components were used to develop the Fisher’s Discriminant function. In this research, there are 4 defined classes of rice kernel which are Aromatic, Brown, Ordinary and Others. Based on the output from principle component method, the 32 variables were reduced to only 2 components. Based on the output of classification table from the discriminant analysis, 40.76% from the total observations were correctly classified into their classes by the PC-Discriminant function. Indirectly, it gives an idea that the classification model developed has committed to more than 50% of misclassifying the observations. As a conclusion, the Fisher’s Discriminant function that was built on a 2-component from PCA (2-PC-D) is not satisfying to classify the rice kernels into its defined classes.

Keywords: classification model, discriminant function, principle component analysis, variable reduction

Procedia PDF Downloads 305
2022 Self-Supervised Attributed Graph Clustering with Dual Contrastive Loss Constraints

Authors: Lijuan Zhou, Mengqi Wu, Changyong Niu

Abstract:

Attributed graph clustering can utilize the graph topology and node attributes to uncover hidden community structures and patterns in complex networks, aiding in the understanding and analysis of complex systems. Utilizing contrastive learning for attributed graph clustering can effectively exploit meaningful implicit relationships between data. However, existing attributed graph clustering methods based on contrastive learning suffer from the following drawbacks: 1) Complex data augmentation increases computational cost, and inappropriate data augmentation may lead to semantic drift. 2) The selection of positive and negative samples neglects the intrinsic cluster structure learned from graph topology and node attributes. Therefore, this paper proposes a method called self-supervised Attributed Graph Clustering with Dual Contrastive Loss constraints (AGC-DCL). Firstly, Siamese Multilayer Perceptron (MLP) encoders are employed to generate two views separately to avoid complex data augmentation. Secondly, the neighborhood contrastive loss is introduced to constrain node representation using local topological structure while effectively embedding attribute information through attribute reconstruction. Additionally, clustering-oriented contrastive loss is applied to fully utilize clustering information in global semantics for discriminative node representations, regarding the cluster centers from two views as negative samples to fully leverage effective clustering information from different views. Comparative clustering results with existing attributed graph clustering algorithms on six datasets demonstrate the superiority of the proposed method.

Keywords: attributed graph clustering, contrastive learning, clustering-oriented, self-supervised learning

Procedia PDF Downloads 13
2021 The Design of the Multi-Agent Classification System (MACS)

Authors: Mohamed R. Mhereeg

Abstract:

The paper discusses the design of a .NET Windows Service based agent system called MACS (Multi-Agent Classification System). MACS is a system aims to accurately classify spread-sheet developers competency over a network. It is designed to automatically and autonomously monitor spread-sheet users and gather their development activities based on the utilization of the software Multi-Agent Technology (MAS). This is accomplished in such a way that makes management capable to efficiently allow for precise tailor training activities for future spread-sheet development. The monitoring agents of MACS are intended to be distributed over the WWW in order to satisfy the monitoring and classification of the multiple developer aspect. The Prometheus methodology is used for the design of the agents of MACS. Prometheus has been used to undertake this phase of the system design because it is developed specifically for specifying and designing agent-oriented systems. Additionally, Prometheus specifies also the communication needed between the agents in order to coordinate to achieve their delegated tasks.

Keywords: classification, design, MACS, MAS, prometheus

Procedia PDF Downloads 369
2020 Monitoring the Rate of Expansion of Agricultural Fields in Mwekera Forest Reserve Using Remote Sensing and Geographic Information Systems

Authors: K. Kanja, M. Mweemba, K. Malungwa

Abstract:

Due to the rampant population growth coupled with retrenchments currently going on in the Copper mines in Zambia, a number of people are resorting to land clearing for agriculture, illegal settlements as well as charcoal production among other vices. This study aims at assessing the rate of expansion of agricultural fields and illegal settlements in protected areas using remote sensing and Geographic Information System. Zambia’s Mwekera National Forest Reserve was used as a case study. Iterative Self-Organizing Data Analysis Technique (ISODATA), as well as maximum likelihood, supervised classification on four Landsat images as well as an accuracy assessment of the classifications was performed. Over the period under observation, results indicate annual percentage changes to be -0.03, -0.49 and 1.26 for agriculture, forests and settlement respectively indicating a higher conversion of forests into human settlements and agriculture.

Keywords: geographic information system, land cover change, Landsat TM and ETM+, Mwekera forest reserve, remote sensing

Procedia PDF Downloads 117
2019 DeepNIC a Method to Transform Each Tabular Variable into an Independant Image Analyzable by Basic CNNs

Authors: Nguyen J. M., Lucas G., Ruan S., Digonnet H., Antonioli D.

Abstract:

Introduction: Deep Learning (DL) is a very powerful tool for analyzing image data. But for tabular data, it cannot compete with machine learning methods like XGBoost. The research question becomes: can tabular data be transformed into images that can be analyzed by simple CNNs (Convolutional Neuron Networks)? Will DL be the absolute tool for data classification? All current solutions consist in repositioning the variables in a 2x2 matrix using their correlation proximity. In doing so, it obtains an image whose pixels are the variables. We implement a technology, DeepNIC, that offers the possibility of obtaining an image for each variable, which can be analyzed by simple CNNs. Material and method: The 'ROP' (Regression OPtimized) model is a binary and atypical decision tree whose nodes are managed by a new artificial neuron, the Neurop. By positioning an artificial neuron in each node of the decision trees, it is possible to make an adjustment on a theoretically infinite number of variables at each node. From this new decision tree whose nodes are artificial neurons, we created the concept of a 'Random Forest of Perfect Trees' (RFPT), which disobeys Breiman's concepts by assembling very large numbers of small trees with no classification errors. From the results of the RFPT, we developed a family of 10 statistical information criteria, Nguyen Information Criterion (NICs), which evaluates in 3 dimensions the predictive quality of a variable: Performance, Complexity and Multiplicity of solution. A NIC is a probability that can be transformed into a grey level. The value of a NIC depends essentially on 2 super parameters used in Neurops. By varying these 2 super parameters, we obtain a 2x2 matrix of probabilities for each NIC. We can combine these 10 NICs with the functions AND, OR, and XOR. The total number of combinations is greater than 100,000. In total, we obtain for each variable an image of at least 1166x1167 pixels. The intensity of the pixels is proportional to the probability of the associated NIC. The color depends on the associated NIC. This image actually contains considerable information about the ability of the variable to make the prediction of Y, depending on the presence or absence of other variables. A basic CNNs model was trained for supervised classification. Results: The first results are impressive. Using the GSE22513 public data (Omic data set of markers of Taxane Sensitivity in Breast Cancer), DEEPNic outperformed other statistical methods, including XGBoost. We still need to generalize the comparison on several databases. Conclusion: The ability to transform any tabular variable into an image offers the possibility of merging image and tabular information in the same format. This opens up great perspectives in the analysis of metadata.

Keywords: tabular data, CNNs, NICs, DeepNICs, random forest of perfect trees, classification

Procedia PDF Downloads 72
2018 Land Use Change Detection Using Remote Sensing and GIS

Authors: Naser Ahmadi Sani, Karim Solaimani, Lida Razaghnia, Jalal Zandi

Abstract:

In recent decades, rapid and incorrect changes in land-use have been associated with consequences such as natural resources degradation and environmental pollution. Detecting changes in land-use is one of the tools for natural resource management and assessment of changes in ecosystems. The target of this research is studying the land-use changes in Haraz basin with an area of 677000 hectares in a 15 years period (1996 to 2011) using LANDSAT data. Therefore, the quality of the images was first evaluated. Various enhancement methods for creating synthetic bonds were used in the analysis. Separate training sites were selected for each image. Then the images of each period were classified in 9 classes using supervised classification method and the maximum likelihood algorithm. Finally, the changes were extracted in GIS environment. The results showed that these changes are an alarm for the HARAZ basin status in future. The reason is that 27% of the area has been changed, which is related to changing the range lands to bare land and dry farming and also changing the dense forest to sparse forest, horticulture, farming land and residential area.

Keywords: Haraz basin, change detection, land-use, satellite data

Procedia PDF Downloads 375
2017 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets - UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: texture classification, texture descriptor, SIFT, SURF, ORB

Procedia PDF Downloads 333
2016 A Hierarchical Method for Multi-Class Probabilistic Classification Vector Machines

Authors: P. Byrnes, F. A. DiazDelaO

Abstract:

The Support Vector Machine (SVM) has become widely recognised as one of the leading algorithms in machine learning for both regression and binary classification. It expresses predictions in terms of a linear combination of kernel functions, referred to as support vectors. Despite its popularity amongst practitioners, SVM has some limitations, with the most significant being the generation of point prediction as opposed to predictive distributions. Stemming from this issue, a probabilistic model namely, Probabilistic Classification Vector Machines (PCVM), has been proposed which respects the original functional form of SVM whilst also providing a predictive distribution. As physical system designs become more complex, an increasing number of classification tasks involving industrial applications consist of more than two classes. Consequently, this research proposes a framework which allows for the extension of PCVM to a multi class setting. Additionally, the original PCVM framework relies on the use of type II maximum likelihood to provide estimates for both the kernel hyperparameters and model evidence. In a high dimensional multi class setting, however, this approach has been shown to be ineffective due to bad scaling as the number of classes increases. Accordingly, we propose the application of Markov Chain Monte Carlo (MCMC) based methods to provide a posterior distribution over both parameters and hyperparameters. The proposed framework will be validated against current multi class classifiers through synthetic and real life implementations.

Keywords: probabilistic classification vector machines, multi class classification, MCMC, support vector machines

Procedia PDF Downloads 201
2015 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding

Authors: Vadivel Ayyasamy

Abstract:

The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.

Keywords: emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation

Procedia PDF Downloads 350
2014 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: classification, fuzzy logic, tolerance relations, rainfall data

Procedia PDF Downloads 278
2013 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 88