Search results for: clusterization and classification algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3738

Search results for: clusterization and classification algorithms

3348 A Nonlinear Feature Selection Method for Hyperspectral Image Classification

Authors: Pei-Jyun Hsieh, Cheng-Hsuan Li, Bor-Chen Kuo

Abstract:

For hyperspectral image classification, feature reduction is an important pre-processing for avoiding the Hughes phenomena due to the difficulty for collecting training samples. Hence, lots of researches developed feature selection methods such as F-score, HSIC (Hilbert-Schmidt Independence Criterion), and etc., to improve hyperspectral image classification. However, most of them only consider the class separability in the original space, i.e., a linear class separability. In this study, we proposed a nonlinear class separability measure based on kernel trick for selecting an appropriate feature subset. The proposed nonlinear class separability was formed by a generalized RBF kernel with different bandwidths with respect to different features. Moreover, it considered the within-class separability and the between-class separability. A genetic algorithm was applied to tune these bandwidths such that the smallest with-class separability and the largest between-class separability simultaneously. This indicates the corresponding feature space is more suitable for classification. In addition, the corresponding nonlinear classification boundary can separate classes very well. These optimal bandwidths also show the importance of bands for hyperspectral image classification. The reciprocals of these bandwidths can be viewed as weights of bands. The smaller bandwidth, the larger weight of the band, and the more importance for classification. Hence, the descending order of the reciprocals of the bands gives an order for selecting the appropriate feature subsets. In the experiments, three hyperspectral image data sets, the Indian Pine Site data set, the PAVIA data set, and the Salinas A data set, were used to demonstrate the selected feature subsets by the proposed nonlinear feature selection method are more appropriate for hyperspectral image classification. Only ten percent of samples were randomly selected to form the training dataset. All non-background samples were used to form the testing dataset. The support vector machine was applied to classify these testing samples based on selected feature subsets. According to the experiments on the Indian Pine Site data set with 220 bands, the highest accuracies by applying the proposed method, F-score, and HSIC are 0.8795, 0.8795, and 0.87404, respectively. However, the proposed method selects 158 features. F-score and HSIC select 168 features and 217 features, respectively. Moreover, the classification accuracies increase dramatically only using first few features. The classification accuracies with respect to feature subsets of 10 features, 20 features, 50 features, and 110 features are 0.69587, 0.7348, 0.79217, and 0.84164, respectively. Furthermore, only using half selected features (110 features) of the proposed method, the corresponding classification accuracy (0.84168) is approximate to the highest classification accuracy, 0.8795. For other two hyperspectral image data sets, the PAVIA data set and Salinas A data set, we can obtain the similar results. These results illustrate our proposed method can efficiently find feature subsets to improve hyperspectral image classification. One can apply the proposed method to determine the suitable feature subset first according to specific purposes. Then researchers can only use the corresponding sensors to obtain the hyperspectral image and classify the samples. This can not only improve the classification performance but also reduce the cost for obtaining hyperspectral images.

Keywords: hyperspectral image classification, nonlinear feature selection, kernel trick, support vector machine

Procedia PDF Downloads 244
3347 An Advanced Automated Brain Tumor Diagnostics Approach

Authors: Berkan Ural, Arif Eser, Sinan Apaydin

Abstract:

Medical image processing is generally become a challenging task nowadays. Indeed, processing of brain MRI images is one of the difficult parts of this area. This study proposes a hybrid well-defined approach which is consisted from tumor detection, extraction and analyzing steps. This approach is mainly consisted from a computer aided diagnostics system for identifying and detecting the tumor formation in any region of the brain and this system is commonly used for early prediction of brain tumor using advanced image processing and probabilistic neural network methods, respectively. For this approach, generally, some advanced noise removal functions, image processing methods such as automatic segmentation and morphological operations are used to detect the brain tumor boundaries and to obtain the important feature parameters of the tumor region. All stages of the approach are done specifically with using MATLAB software. Generally, for this approach, firstly tumor is successfully detected and the tumor area is contoured with a specific colored circle by the computer aided diagnostics program. Then, the tumor is segmented and some morphological processes are achieved to increase the visibility of the tumor area. Moreover, while this process continues, the tumor area and important shape based features are also calculated. Finally, with using the probabilistic neural network method and with using some advanced classification steps, tumor area and the type of the tumor are clearly obtained. Also, the future aim of this study is to detect the severity of lesions through classes of brain tumor which is achieved through advanced multi classification and neural network stages and creating a user friendly environment using GUI in MATLAB. In the experimental part of the study, generally, 100 images are used to train the diagnostics system and 100 out of sample images are also used to test and to check the whole results. The preliminary results demonstrate the high classification accuracy for the neural network structure. Finally, according to the results, this situation also motivates us to extend this framework to detect and localize the tumors in the other organs.

Keywords: image processing algorithms, magnetic resonance imaging, neural network, pattern recognition

Procedia PDF Downloads 386
3346 Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao

Abstract:

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Keywords: artificial intelligence and office, NLP, deep learning, text classification

Procedia PDF Downloads 155
3345 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects

Authors: Behnam Tavakkol

Abstract:

Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.

Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data

Procedia PDF Downloads 179
3344 Comprehensive Feature Extraction for Optimized Condition Assessment of Fuel Pumps

Authors: Ugochukwu Ejike Akpudo, Jank-Wook Hur

Abstract:

The increasing demand for improved productivity, maintainability, and reliability has prompted rapidly increasing research studies on the emerging condition-based maintenance concept- Prognostics and health management (PHM). Varieties of fuel pumps serve critical functions in several hydraulic systems; hence, their failure can have daunting effects on productivity, safety, etc. The need for condition monitoring and assessment of these pumps cannot be overemphasized, and this has led to the uproar in research studies on standard feature extraction techniques for optimized condition assessment of fuel pumps. By extracting time-based, frequency-based and the more robust time-frequency based features from these vibrational signals, a more comprehensive feature assessment (and selection) can be achieved for a more accurate and reliable condition assessment of these pumps. With the aid of emerging deep classification and regression algorithms like the locally linear embedding (LLE), we propose a method for comprehensive condition assessment of electromagnetic fuel pumps (EMFPs). Results show that the LLE as a comprehensive feature extraction technique yields better feature fusion/dimensionality reduction results for condition assessment of EMFPs against the use of single features. Also, unlike other feature fusion techniques, its capabilities as a fault classification technique were explored, and the results show an acceptable accuracy level using standard performance metrics for evaluation.

Keywords: electromagnetic fuel pumps, comprehensive feature extraction, condition assessment, locally linear embedding, feature fusion

Procedia PDF Downloads 93
3343 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 203
3342 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 105
3341 Iterative Segmentation and Application of Hausdorff Dilation Distance in Defect Detection

Authors: S. Shankar Bharathi

Abstract:

Inspection of surface defects on metallic components has always been challenging due to its specular property. Occurrences of defects such as scratches, rust, pitting are very common in metallic surfaces during the manufacturing process. These defects if unchecked can hamper the performance and reduce the life time of such component. Many of the conventional image processing algorithms in detecting the surface defects generally involve segmentation techniques, based on thresholding, edge detection, watershed segmentation and textural segmentation. They later employ other suitable algorithms based on morphology, region growing, shape analysis, neural networks for classification purpose. In this paper the work has been focused only towards detecting scratches. Global and other thresholding techniques were used to extract the defects, but it proved to be inaccurate in extracting the defects alone. However, this paper does not focus on comparison of different segmentation techniques, but rather describes a novel approach towards segmentation combined with hausdorff dilation distance. The proposed algorithm is based on the distribution of the intensity levels, that is, whether a certain gray level is concentrated or evenly distributed. The algorithm is based on extraction of such concentrated pixels. Defective images showed higher level of concentration of some gray level, whereas in non-defective image, there seemed to be no concentration, but were evenly distributed. This formed the basis in detecting the defects in the proposed algorithm. Hausdorff dilation distance based on mathematical morphology was used to strengthen the segmentation of the defects.

Keywords: metallic surface, scratches, segmentation, hausdorff dilation distance, machine vision

Procedia PDF Downloads 391
3340 Land Use/Land Cover Mapping Using Landsat 8 and Sentinel-2 in a Mediterranean Landscape

Authors: Moschos Vogiatzis, K. Perakis

Abstract:

Spatial-explicit and up-to-date land use/land cover information is fundamental for spatial planning, land management, sustainable development, and sound decision-making. In the last decade, many satellite-derived land cover products at different spatial, spectral, and temporal resolutions have been developed, such as the European Copernicus Land Cover product. However, more efficient and detailed information for land use/land cover is required at the regional or local scale. A typical Mediterranean basin with a complex landscape comprised of various forest types, crops, artificial surfaces, and wetlands was selected to test and develop our approach. In this study, we investigate the improvement of Copernicus Land Cover product (CLC2018) using Landsat 8 and Sentinel-2 pixel-based classification based on all available existing geospatial data (Forest Maps, LPIS, Natura2000 habitats, cadastral parcels, etc.). We examined and compared the performance of the Random Forest classifier for land use/land cover mapping. In total, 10 land use/land cover categories were recognized in Landsat 8 and 11 in Sentinel-2A. A comparison of the overall classification accuracies for 2018 shows that Landsat 8 classification accuracy was slightly higher than Sentinel-2A (82,99% vs. 80,30%). We concluded that the main land use/land cover types of CLC2018, even within a heterogeneous area, can be successfully mapped and updated according to CLC nomenclature. Future research should be oriented toward integrating spatiotemporal information from seasonal bands and spectral indexes in the classification process.

Keywords: classification, land use/land cover, mapping, random forest

Procedia PDF Downloads 101
3339 Algorithms Inspired from Human Behavior Applied to Optimization of a Complex Process

Authors: S. Curteanu, F. Leon, M. Gavrilescu, S. A. Floria

Abstract:

Optimization algorithms inspired from human behavior were applied in this approach, associated with neural networks models. The algorithms belong to human behaviors of learning and cooperation and human competitive behavior classes. For the first class, the main strategies include: random learning, individual learning, and social learning, and the selected algorithms are: simplified human learning optimization (SHLO), social learning optimization (SLO), and teaching-learning based optimization (TLBO). For the second class, the concept of learning is associated with competitiveness, and the selected algorithms are sports-inspired algorithms (with Football Game Algorithm, FGA and Volleyball Premier League, VPL) and Imperialist Competitive Algorithm (ICA). A real process, the synthesis of polyacrylamide-based multicomponent hydrogels, where some parameters are difficult to obtain experimentally, is considered as a case study. Reaction yield and swelling degree are predicted as a function of reaction conditions (acrylamide concentration, initiator concentration, crosslinking agent concentration, temperature, reaction time, and amount of inclusion polymer, which could be starch, poly(vinyl alcohol) or gelatin). The experimental results contain 175 data. Artificial neural networks are obtained in optimal form with biologically inspired algorithm; the optimization being perform at two level: structural and parametric. Feedforward neural networks with one or two hidden layers and no more than 25 neurons in intermediate layers were obtained with values of correlation coefficient in the validation phase over 0.90. The best results were obtained with TLBO algorithm, correlation coefficient being 0.94 for an MLP(6:9:20:2) – a feedforward neural network with two hidden layers and 9 and 20, respectively, intermediate neurons. Good results obtained prove the efficiency of the optimization algorithms. More than the good results, what is important in this approach is the simulation methodology, including neural networks and optimization biologically inspired algorithms, which provide satisfactory results. In addition, the methodology developed in this approach is general and has flexibility so that it can be easily adapted to other processes in association with different types of models.

Keywords: artificial neural networks, human behaviors of learning and cooperation, human competitive behavior, optimization algorithms

Procedia PDF Downloads 87
3338 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation

Authors: Ksenia Meshkova

Abstract:

With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.

Keywords: neural networks, computer vision, representation learning, autoencoders

Procedia PDF Downloads 97
3337 Terrain Classification for Ground Robots Based on Acoustic Features

Authors: Bernd Kiefer, Abraham Gebru Tesfay, Dietrich Klakow

Abstract:

The motivation of our work is to detect different terrain types traversed by a robot based on acoustic data from the robot-terrain interaction. Different acoustic features and classifiers were investigated, such as Mel-frequency cepstral coefficient and Gamma-tone frequency cepstral coefficient for the feature extraction, and Gaussian mixture model and Feed forward neural network for the classification. We analyze the system’s performance by comparing our proposed techniques with some other features surveyed from distinct related works. We achieve precision and recall values between 87% and 100% per class, and an average accuracy at 95.2%. We also study the effect of varying audio chunk size in the application phase of the models and find only a mild impact on performance.

Keywords: acoustic features, autonomous robots, feature extraction, terrain classification

Procedia PDF Downloads 338
3336 Hexagonal Honeycomb Sandwich Plate Optimization Using Gravitational Search Algorithm

Authors: A. Boudjemai, A. Zafrane, R. Hocine

Abstract:

Honeycomb sandwich panels are increasingly used in the construction of space vehicles because of their outstanding strength, stiffness and light weight properties. However, the use of honeycomb sandwich plates comes with difficulties in the design process as a result of the large number of design variables involved, including composite material design, shape and geometry. Hence, this work deals with the presentation of an optimal design of hexagonal honeycomb sandwich structures subjected to space environment. The optimization process is performed using a set of algorithms including the gravitational search algorithm (GSA). Numerical results are obtained and presented for a set of algorithms. The results obtained by the GSA algorithm are much better compared to other algorithms used in this study.

Keywords: optimization, gravitational search algorithm, genetic algorithm, honeycomb plate

Procedia PDF Downloads 354
3335 The Implementation of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper discusses the implementation of the MultiAgent classification System (MACS) and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies, which are the .NET widows service based agents, the Windows Communication Foundation (WCF) services, the Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). Microsoft's .NET windows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW. The Monitoring Agents (MAs) were configured to execute automatically to monitor excel spreadsheets development activities by content. Data gathered by the Monitoring Agents from various resources over a period of time was collected and filtered by a Database Updater Agent (DUA) residing in the .NET client application of the system. This agent then transfers and stores the data in Oracle server database via Oracle stored procedures for further processing that leads to the classification of the end user developers.

Keywords: MACS, implementation, multi-agent, SOA, autonomous, WCF

Procedia PDF Downloads 256
3334 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 64
3333 Comparison of Back-Projection with Non-Uniform Fast Fourier Transform for Real-Time Photoacoustic Tomography

Authors: Moung Young Lee, Chul Gyu Song

Abstract:

Photoacoustic imaging is the imaging technology that combines the optical imaging and ultrasound. This provides the high contrast and resolution due to optical imaging and ultrasound imaging, respectively. We developed the real-time photoacoustic tomography (PAT) system using linear-ultrasound transducer and digital acquisition (DAQ) board. There are two types of algorithm for reconstructing the photoacoustic signal. One is back-projection algorithm, the other is FFT algorithm. Especially, we used the non-uniform FFT algorithm. To evaluate the performance of our system and algorithms, we monitored two wires that stands at interval of 2.89 mm and 0.87 mm. Then, we compared the images reconstructed by algorithms. Finally, we monitored the two hairs crossed and compared between these algorithms.

Keywords: back-projection, image comparison, non-uniform FFT, photoacoustic tomography

Procedia PDF Downloads 410
3332 Security of Database Using Chaotic Systems

Authors: Eman W. Boghdady, A. R. Shehata, M. A. Azem

Abstract:

Database (DB) security demands permitting authorized users and prohibiting non-authorized users and intruders actions on the DB and the objects inside it. Organizations that are running successfully demand the confidentiality of their DBs. They do not allow the unauthorized access to their data/information. They also demand the assurance that their data is protected against any malicious or accidental modification. DB protection and confidentiality are the security concerns. There are four types of controls to obtain the DB protection, those include: access control, information flow control, inference control, and cryptographic. The cryptographic control is considered as the backbone for DB security, it secures the DB by encryption during storage and communications. Current cryptographic techniques are classified into two types: traditional classical cryptography using standard algorithms (DES, AES, IDEA, etc.) and chaos cryptography using continuous (Chau, Rossler, Lorenz, etc.) or discreet (Logistics, Henon, etc.) algorithms. The important characteristics of chaos are its extreme sensitivity to initial conditions of the system. In this paper, DB-security systems based on chaotic algorithms are described. The Pseudo Random Numbers Generators (PRNGs) from the different chaotic algorithms are implemented using Matlab and their statistical properties are evaluated using NIST and other statistical test-suits. Then, these algorithms are used to secure conventional DB (plaintext), where the statistical properties of the ciphertext are also tested. To increase the complexity of the PRNGs and to let pass all the NIST statistical tests, we propose two hybrid PRNGs: one based on two chaotic Logistic maps and another based on two chaotic Henon maps, where each chaotic algorithm is running side-by-side and starting from random independent initial conditions and parameters (encryption keys). The resulted hybrid PRNGs passed the NIST statistical test suit.

Keywords: algorithms and data structure, DB security, encryption, chaotic algorithms, Matlab, NIST

Procedia PDF Downloads 244
3331 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 183
3330 An Ensemble Learning Method for Applying Particle Swarm Optimization Algorithms to Systems Engineering Problems

Authors: Ken Hampshire, Thomas Mazzuchi, Shahram Sarkani

Abstract:

As a subset of metaheuristics, nature-inspired optimization algorithms such as particle swarm optimization (PSO) have shown promise both in solving intractable problems and in their extensibility to novel problem formulations due to their general approach requiring few assumptions. Unfortunately, single instantiations of algorithms require detailed tuning of parameters and cannot be proven to be best suited to a particular illustrative problem on account of the “no free lunch” (NFL) theorem. Using these algorithms in real-world problems requires exquisite knowledge of the many techniques and is not conducive to reconciling the various approaches to given classes of problems. This research aims to present a unified view of PSO-based approaches from the perspective of relevant systems engineering problems, with the express purpose of then eliciting the best solution for any problem formulation in an ensemble learning bucket of models approach. The central hypothesis of the research is that extending the PSO algorithms found in the literature to real-world optimization problems requires a general ensemble-based method for all problem formulations but a specific implementation and solution for any instance. The main results are a problem-based literature survey and a general method to find more globally optimal solutions for any systems engineering optimization problem.

Keywords: particle swarm optimization, nature-inspired optimization, metaheuristics, systems engineering, ensemble learning

Procedia PDF Downloads 62
3329 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents

Authors: Ying Zhao, Xingyan Bin

Abstract:

Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.

Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints

Procedia PDF Downloads 508
3328 Comparative Analysis of Spectral Estimation Methods for Brain-Computer Interfaces

Authors: Rafik Djemili, Hocine Bourouba, M. C. Amara Korba

Abstract:

In this paper, we present a method in order to classify EEG signals for Brain-Computer Interfaces (BCI). EEG signals are first processed by means of spectral estimation methods to derive reliable features before classification step. Spectral estimation methods used are standard periodogram and the periodogram calculated by the Welch method; both methods are compared with Logarithm of Band Power (logBP) features. In the method proposed, we apply Linear Discriminant Analysis (LDA) followed by Support Vector Machine (SVM). Classification accuracy reached could be as high as 85%, which proves the effectiveness of classification of EEG signals based BCI using spectral methods.

Keywords: brain-computer interface, motor imagery, electroencephalogram, linear discriminant analysis, support vector machine

Procedia PDF Downloads 476
3327 Evolutionary Methods in Cryptography

Authors: Wafa Slaibi Alsharafat

Abstract:

Genetic algorithms (GA) are random algorithms as random numbers that are generated during the operation of the algorithm determine what happens. This means that if GA is applied twice to optimize exactly the same problem it might produces two different answers. In this project, we propose an evolutionary algorithm and Genetic Algorithm (GA) to be implemented in symmetric encryption and decryption. Here, user's message and user secret information (key) which represent plain text to be transferred into cipher text.

Keywords: GA, encryption, decryption, crossover

Procedia PDF Downloads 417
3326 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks

Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle

Abstract:

Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.

Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3

Procedia PDF Downloads 27
3325 Obstacle Classification Method Based on 2D LIDAR Database

Authors: Moohyun Lee, Soojung Hur, Yongwan Park

Abstract:

In this paper is proposed a method uses only LIDAR system to classification an obstacle and determine its type by establishing database for classifying obstacles based on LIDAR. The existing LIDAR system, in determining the recognition of obstruction in an autonomous vehicle, has an advantage in terms of accuracy and shorter recognition time. However, it was difficult to determine the type of obstacle and therefore accurate path planning based on the type of obstacle was not possible. In order to overcome this problem, a method of classifying obstacle type based on existing LIDAR and using the width of obstacle materials was proposed. However, width measurement was not sufficient to improve accuracy. In this research, the width data was used to do the first classification; database for LIDAR intensity data by four major obstacle materials on the road were created; comparison is made to the LIDAR intensity data of actual obstacle materials; and determine the obstacle type by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that data declined in quality in comparison to 3D LIDAR and it was possible to classify obstacle materials using 2D LIDAR.

Keywords: obstacle, classification, database, LIDAR, segmentation, intensity

Procedia PDF Downloads 313
3324 Performance Analysis with the Combination of Visualization and Classification Technique for Medical Chatbot

Authors: Shajida M., Sakthiyadharshini N. P., Kamalesh S., Aswitha B.

Abstract:

Natural Language Processing (NLP) continues to play a strategic part in complaint discovery and medicine discovery during the current epidemic. This abstract provides an overview of performance analysis with a combination of visualization and classification techniques of NLP for a medical chatbot. Sentiment analysis is an important aspect of NLP that is used to determine the emotional tone behind a piece of text. This technique has been applied to various domains, including medical chatbots. In this, we have compared the combination of the decision tree with heatmap and Naïve Bayes with Word Cloud. The performance of the chatbot was evaluated using accuracy, and the results indicate that the combination of visualization and classification techniques significantly improves the chatbot's performance.

Keywords: sentimental analysis, NLP, medical chatbot, decision tree, heatmap, naïve bayes, word cloud

Procedia PDF Downloads 50
3323 Metamorphic Computer Virus Classification Using Hidden Markov Model

Authors: Babak Bashari Rad

Abstract:

A metamorphic computer virus uses different code transformation techniques to mutate its body in duplicated instances. Characteristics and function of new instances are mostly similar to their parents, but they cannot be easily detected by the majority of antivirus in market, as they depend on string signature-based detection techniques. The purpose of this research is to propose a Hidden Markov Model for classification of metamorphic viruses in executable files. In the proposed solution, portable executable files are inspected to extract the instructions opcodes needed for the examination of code. A Hidden Markov Model trained on portable executable files is employed to classify the metamorphic viruses of the same family. The proposed model is able to generate and recognize common statistical features of mutated code. The model has been evaluated by examining the model on a test data set. The performance of the model has been practically tested and evaluated based on False Positive Rate, Detection Rate and Overall Accuracy. The result showed an acceptable performance with high average of 99.7% Detection Rate.

Keywords: malware classification, computer virus classification, metamorphic virus, metamorphic malware, Hidden Markov Model

Procedia PDF Downloads 287
3322 Road Vehicle Recognition Using Magnetic Sensing Feature Extraction and Classification

Authors: Xiao Chen, Xiaoying Kong, Min Xu

Abstract:

This paper presents a road vehicle detection approach for the intelligent transportation system. This approach mainly uses low-cost magnetic sensor and associated data collection system to collect magnetic signals. This system can measure the magnetic field changing, and it also can detect and count vehicles. We extend Mel Frequency Cepstral Coefficients to analyze vehicle magnetic signals. Vehicle type features are extracted using representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 2-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to four typical types of vehicles in Australian suburbs: sedan, VAN, truck, and bus. Experiments results show that our approach achieves a high level of accuracy for vehicle detection and classification.

Keywords: vehicle classification, signal processing, road traffic model, magnetic sensing

Procedia PDF Downloads 294
3321 Comparative Study of Accuracy of Land Cover/Land Use Mapping Using Medium Resolution Satellite Imagery: A Case Study

Authors: M. C. Paliwal, A. K. Jain, S. K. Katiyar

Abstract:

Classification of satellite imagery is very important for the assessment of its accuracy. In order to determine the accuracy of the classified image, usually the assumed-true data are derived from ground truth data using Global Positioning System. The data collected from satellite imagery and ground truth data is then compared to find out the accuracy of data and error matrices are prepared. Overall and individual accuracies are calculated using different methods. The study illustrates advanced classification and accuracy assessment of land use/land cover mapping using satellite imagery. IRS-1C-LISS IV data were used for classification of satellite imagery. The satellite image was classified using the software in fourteen classes namely water bodies, agricultural fields, forest land, urban settlement, barren land and unclassified area etc. Classification of satellite imagery and calculation of accuracy was done by using ERDAS-Imagine software to find out the best method. This study is based on the data collected for Bhopal city boundaries of Madhya Pradesh State of India.

Keywords: resolution, accuracy assessment, land use mapping, satellite imagery, ground truth data, error matrices

Procedia PDF Downloads 477
3320 Evaluation of Photovoltaic System with Different Research Methods of Maximum Power Point Tracking

Authors: Mehdi Ameur, Ahmed Essadki, Tamou Nasser

Abstract:

The purpose of this paper is the evaluation of photovoltaic system with MPPT techniques. This system is developed by combining the models of established solar module and DC-DC converter with the algorithms of perturbing and observing (P&O), incremental conductance (INC) and fuzzy logic controller (FLC). The system is simulated under different climate conditions and MPPT algorithms to determine the influence of these conditions on characteristic power-voltage of PV system. According to the comparisons of the simulation results, the photovoltaic system can extract the maximum power with precision and rapidity using the MPPT algorithms discussed in this paper.

Keywords: fuzzy logic controller, FLC, hill climbing, HC, incremental conductance (INC), perturb and observe (P&O), maximum power point, MPP, maximum power point tracking, MPPT

Procedia PDF Downloads 488
3319 MSIpred: A Python 2 Package for the Classification of Tumor Microsatellite Instability from Tumor Mutation Annotation Data Using a Support Vector Machine

Authors: Chen Wang, Chun Liang

Abstract:

Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite (MS) length due to a deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for tumor prognostic. Conventional clinical diagnosis of MSI examines PCR products of a panel of MS markers using electrophoresis (MSI-PCR) which is laborious, time consuming, and less reliable. MSIpred, a python 2 package for automatic classification of MSI was released by this study. It computes important somatic mutation features from files in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these to predict tumor MSI status with a support vector machine (SVM) classifier trained by MAF files of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent 358-tumor test set achieved overall accuracy of over 98% and area under receiver operating characteristic (ROC) curve of 0.967. These results indicated that MSIpred is a robust pan-cancer MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.

Keywords: microsatellite instability, pan-cancer classification, somatic mutation, support vector machine

Procedia PDF Downloads 146