Search results for: eukaryotic clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 445

Search results for: eukaryotic clustering

265 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: Clustering, load profiling, load modeling, machine learning, energy efficiency and quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1211
264 Wavelet Based Residual Method of Detecting GSM Signal Strength Fading

Authors: Danladi Ali, Onah Festus Iloabuchi

Abstract:

In this paper, GSM signal strength was measured in order to detect the type of the signal fading phenomenon using onedimensional multilevel wavelet residual method and neural network clustering to determine the average GSM signal strength received in the study area. The wavelet residual method predicted that the GSM signal experienced slow fading and attenuated with MSE of 3.875dB. The neural network clustering revealed that mostly -75dB, -85dB and -95dB were received. This means that the signal strength received in the study is a weak signal.

Keywords: One-dimensional multilevel wavelets, path loss, GSM signal strength, propagation and urban environment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958
263 Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Authors: Kasthurirangan Gopalakrishnan, Siddhartha Khaitan, Anshu Manik

Abstract:

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Keywords: Artificial neural networks, cluster analysis, Kohonen maps, wine recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2123
262 Geographic Profiling Based on Multi-point Centrography with K-means Clustering

Authors: Jiaji Zhou, Le Liang, Long Chen

Abstract:

Geographic Profiling has successfully assisted investigations for serial crimes. Considering the multi-cluster feature of serial criminal spots, we propose a Multi-point Centrography model as a natural extension of Single-point Centrography for geographic profiling. K-means clustering is first performed on the data samples and then Single-point Centrography is adopted to derive a probability distribution on each cluster. Finally, a weighted combinations of each distribution is formed to make next-crime spot prediction. Experimental study on real cases demonstrates the effectiveness of our proposed model.

Keywords: Geographic profiling, Centrography model, K-means algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2085
261 Comparative Study of Evolutionary Model and Clustering Methods in Circuit Partitioning Pertaining to VLSI Design

Authors: K. A. Sumitra Devi, N. P. Banashree, Annamma Abraham

Abstract:

Partitioning is a critical area of VLSI CAD. In order to build complex digital logic circuits its often essential to sub-divide multi -million transistor design into manageable Pieces. This paper looks at the various partitioning techniques aspects of VLSI CAD, targeted at various applications. We proposed an evolutionary time-series model and a statistical glitch prediction system using a neural network with selection of global feature by making use of clustering method model, for partitioning a circuit. For evolutionary time-series model, we made use of genetic, memetic & neuro-memetic techniques. Our work focused in use of clustering methods - K-means & EM methodology. A comparative study is provided for all techniques to solve the problem of circuit partitioning pertaining to VLSI design. The performance of all approaches is compared using benchmark data provided by MCNC standard cell placement benchmark net lists. Analysis of the investigational results proved that the Neuro-memetic model achieves greater performance then other model in recognizing sub-circuits with minimum amount of interconnections between them.

Keywords: VLSI, circuit partitioning, memetic algorithm, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1639
260 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: Data grids, fault tolerance, chandy-lamport, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 951
259 Issue Reorganization Using the Measure of Relevance

Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim

Abstract:

The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.

Keywords: Clustering, Social Network Analysis, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038
258 Generating Normally Distributed Clusters by Means of a Self-organizing Growing Neural Network– An Application to Market Segmentation –

Authors: Reinhold Decker, Christian Holsing, Sascha Lerke

Abstract:

This paper presents a new growing neural network for cluster analysis and market segmentation, which optimizes the size and structure of clusters by iteratively checking them for multivariate normality. We combine the recently published SGNN approach [8] with the basic principle underlying the Gaussian-means algorithm [13] and the Mardia test for multivariate normality [18, 19]. The new approach distinguishes from existing ones by its holistic design and its great autonomy regarding the clustering process as a whole. Its performance is demonstrated by means of synthetic 2D data and by real lifestyle survey data usable for market segmentation.

Keywords: Artificial neural network, clustering, multivariatenormality, market segmentation, self-organization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1200
257 Anomaly Detection and Characterization to Classify Traffic Anomalies Case Study: TOT Public Company Limited Network

Authors: O. Siriporn, S. Benjawan

Abstract:

This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.

Keywords: Unsupervised, clustering, anomaly, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2113
256 Performance Evaluation of Clustered Routing Protocols for Heterogeneous Wireless Sensor Networks

Authors: Awatef Chniguir, Tarek Farah, Zouhair Ben Jemaa, Safya Belguith

Abstract:

Optimal routing allows minimizing energy consumption in wireless sensor networks (WSN). Clustering has proven its effectiveness in organizing WSN by reducing channel contention and packet collision and enhancing network throughput under heavy load. Therefore, nowadays, with the emergence of the Internet of Things, heterogeneity is essential. Stable election protocol (SEP) that has increased the network stability period and lifetime is the first clustering protocol for heterogeneous WSN. SEP and its descendants, namely SEP, Threshold Sensitive SEP (TSEP), Enhanced TSEP (ETSSEP) and Current Energy Allotted TSEP (CEATSEP), were studied. These algorithms’ performance was evaluated based on different metrics, especially first node death (FND), to compare their stability. Simulations were conducted on the MATLAB tool considering two scenarios: The first one demonstrates the fraction variation of advanced nodes by setting the number of total nodes. The second considers the interpretation of the number of nodes while keeping the number of advanced nodes permanent. CEATSEP outperforms its antecedents by increasing stability and, at the same time, keeping a low throughput. It also operates very well in a large-scale network. Consequently, CEATSEP has a useful lifespan and energy efficiency compared to the other routing protocol for heterogeneous WSN.

Keywords: Clustering, heterogeneous, stability, scalability, throughput, IoT, WSN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 423
255 An Advanced Nelder Mead Simplex Method for Clustering of Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.

Keywords: Spread out, simplex, multi-minima, fitness function, optimization, search area, monocyte, solution, genomes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2558
254 Effect of Clustering on Energy Efficiency and Network Lifetime in Wireless Sensor Networks

Authors: Prakash G L, Chaitra K Meti, Poojitha K, Divya R.K.

Abstract:

Wireless Sensor Network is Multi hop Self-configuring Wireless Network consisting of sensor nodes. The deployment of wireless sensor networks in many application areas, e.g., aggregation services, requires self-organization of the network nodes into clusters. Efficient way to enhance the lifetime of the system is to partition the network into distinct clusters with a high energy node as cluster head. The different methods of node clustering techniques have appeared in the literature, and roughly fall into two families; those based on the construction of a dominating set and those which are based solely on energy considerations. Energy optimized cluster formation for a set of randomly scattered wireless sensors is presented. Sensors within a cluster are expected to be communicating with cluster head only. The energy constraint and limited computing resources of the sensor nodes present the major challenges in gathering the data. In this paper we propose a framework to study how partially correlated data affect the performance of clustering algorithms. The total energy consumption and network lifetime can be analyzed by combining random geometry techniques and rate distortion theory. We also present the relation between compression distortion and data correlation.

Keywords: Clusters, multi hop, random geometry, rate distortion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1637
253 TOSOM: A Topic-Oriented Self-Organizing Map for Text Organization

Authors: Hsin-Chang Yang, Chung-Hong Lee, Kuo-Lung Ke

Abstract:

The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.

Keywords: Self-organizing map, topic identification, learning algorithm, text clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026
252 Enhancing Privacy-Preserving Cloud Database Querying by Preventing Brute Force Attacks

Authors: Ambika Vishal Pawar, Ajay Dani

Abstract:

Considering the complexities involved in Cloud computing, there are still plenty of issues that affect the privacy of data in cloud environment. Unless these problems get solved, we think that the problem of preserving privacy in cloud databases is still open. In tokenization and homomorphic cryptography based solutions for privacy preserving cloud database querying, there is possibility that by colluding with service provider adversary may run brute force attacks that will reveal the attribute values.

In this paper we propose a solution by defining the variant of K –means clustering algorithm that effectively detects such brute force attacks and enhances privacy of cloud database querying by preventing this attacks.

Keywords: Privacy, Database, Cloud Computing, Clustering, K-means, Cryptography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2556
251 Energy Efficient Data Aggregation in Sensor Networks with Optimized Cluster Head Selection

Authors: D. Naga Ravi Kiran, C. G. Dethe

Abstract:

Wireless Sensor Network (WSN) routing is complex due to its dynamic nature, computational overhead, limited battery life, non-conventional addressing scheme, self-organization, and sensor nodes limited transmission range. An energy efficient routing protocol is a major concern in WSN. LEACH is a hierarchical WSN routing protocol to increase network life. It performs self-organizing and re-clustering functions for each round. This study proposes a better sensor networks cluster head selection for efficient data aggregation. The algorithm is based on Tabu search.

Keywords: Wireless Sensor Network (WSN), LEACH, Clustering, Tabu Search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2028
250 Laser Data Based Automatic Generation of Lane-Level Road Map for Intelligent Vehicles

Authors: Zehai Yu, Hui Zhu, Linglong Lin, Huawei Liang, Biao Yu, Weixin Huang

Abstract:

With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.

Keywords: Curve fitting, lane-level road map, line recognition, multi-thresholding, two-stage clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 512
249 A Symbol by Symbol Clustering Based Blind Equalizer

Authors: Kristina Georgoulakis

Abstract:

A new blind symbol by symbol equalizer is proposed. The operation of the proposed equalizer is based on the geometric properties of the two dimensional data constellation. An unsupervised clustering technique is used to locate the clusters formed by the received data. The symmetric properties of the clusters labels are subsequently utilized in order to label the clusters. Following this step, the received data are compared to clusters and decisions are made on a symbol by symbol basis, by assigning to each data the label of the nearest cluster. The operation of the equalizer is investigated both in linear and nonlinear channels. The performance of the proposed equalizer is compared to the performance of a CMAbased blind equalizer.

Keywords: Blind equalization, channel equalization, cluster based equalisers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1435
248 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational  data mining  is  a  specific  data   mining field applied to data originating from educational environments, it relies on different  approaches to discover hidden knowledge  from  the  available   data. Among these approaches are   machine   learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In  our  research, we propose  a “Student  Advisory  Framework” that  utilizes  classification  and  clustering  to  build  an  intelligent system. This system can be used to provide pieces of consultations to a first year  university  student to  pursue a  certain   education   track   where  he/she  will  likely  succeed  in, aiming  to  decrease   the  high  rate   of  academic  failure   among these  students.  A real case study  in Cairo  Higher  Institute  for Engineering, Computer  Science  and  Management  is  presented using  real  dataset   collected  from  2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5213
247 Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco

Authors: Rachid Ait daoud, Abdellah Amine, Belaid Bouikhalene, Rachid Lbibb

Abstract:

Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, kmeans method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.

Keywords: Customer value, LRFM model, Cluster analysis, Self-Organizing Maps method (SOM), K-means algorithm, loyalty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6253
246 Designing Social Care Policies in the Long Term: A Study Using Regression, Clustering and Backpropagation Neural Nets

Authors: Sotirios Raptis

Abstract:

Linking social needs to social classes using different criteria may lead to social services misuse. The paper discusses using ML and Neural Networks (NNs) in linking public services in Scotland in the long term and advocates, this can result in a reduction of the services cost connecting resources needed in groups for similar services. The paper combines typical regression models with clustering and cross-correlation as complementary constituents to predict the demand. Insurance companies and public policymakers can pack linked services such as those offered to the elderly or to low-income people in the longer term. The work is based on public data from 22 services offered by Public Health Services (PHS) Scotland and from the Scottish Government (SG) from 1981 to 2019 that are broken into 110 years series called factors and uses Linear Regression (LR), Autoregression (ARMA) and 3 types of back-propagation (BP) Neural Networks (BPNN) to link them under specific conditions. Relationships found were between smoking related healthcare provision, mental health-related health services, and epidemiological weight in Primary 1(Education) Body Mass Index (BMI) in children. Primary component analysis (PCA) found 11 significant factors while C-Means (CM) clustering gave 5 major factors clusters.

Keywords: Probability, cohorts, data frames, services, prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 462
245 Color Image Segmentation using Adaptive Spatial Gaussian Mixture Model

Authors: M.Sujaritha, S. Annadurai

Abstract:

An adaptive spatial Gaussian mixture model is proposed for clustering based color image segmentation. A new clustering objective function which incorporates the spatial information is introduced in the Bayesian framework. The weighting parameter for controlling the importance of spatial information is made adaptive to the image content to augment the smoothness towards piecewisehomogeneous region and diminish the edge-blurring effect and hence the name adaptive spatial finite mixture model. The proposed approach is compared with the spatially variant finite mixture model for pixel labeling. The experimental results with synthetic and Berkeley dataset demonstrate that the proposed method is effective in improving the segmentation and it can be employed in different practical image content understanding applications.

Keywords: Adaptive; Spatial, Mixture model, Segmentation, Color.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2498
244 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
243 A Software Framework for Predicting Oil-Palm Yield from Climate Data

Authors: Mohd. Noor Md. Sap, A. Majid Awan

Abstract:

Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.

Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1979
242 Object-Based Image Indexing and Retrieval in DCT Domain using Clustering Techniques

Authors: Hossein Nezamabadi-pour, Saeid Saryazdi

Abstract:

In this paper, we present a new and effective image indexing technique that extracts features directly from DCT domain. Our proposed approach is an object-based image indexing. For each block of size 8*8 in DCT domain a feature vector is extracted. Then, feature vectors of all blocks of image using a k-means algorithm is clustered into groups. Each cluster represents a special object of the image. Then we select some clusters that have largest members after clustering. The centroids of the selected clusters are taken as image feature vectors and indexed into the database. Also, we propose an approach for using of proposed image indexing method in automatic image classification. Experimental results on a database of 800 images from 8 semantic groups in automatic image classification are reported.

Keywords: Object-based image retrieval, DCT domain, Image indexing, Image classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2025
241 Hybrid Hierarchical Routing Protocol for WSN Lifetime Maximization

Authors: H. Aoudia, Y. Touati, E. H. Teguig, A. Ali Cherif

Abstract:

Conceiving and developing routing protocols for wireless sensor networks requires considerations on constraints such as network lifetime and energy consumption. In this paper, we propose a hybrid hierarchical routing protocol named HHRP combining both clustering mechanism and multipath optimization taking into account residual energy and RSSI measures. HHRP consists of classifying dynamically nodes into clusters where coordinators nodes with extra privileges are able to manipulate messages, aggregate data and ensure transmission between nodes according to TDMA and CDMA schedules. The reconfiguration of the network is carried out dynamically based on a threshold value which is associated with the number of nodes belonging to the smallest cluster. To show the effectiveness of the proposed approach HHRP, a comparative study with LEACH protocol is illustrated in simulations.

Keywords: Routing protocols, energy optimization, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 902
240 Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition

Authors: M. Kalamani, S. Valarmathy, M. Krishnamoorthi

Abstract:

In this paper, Fuzzy C-Means clustering with Expectation Maximization-Gaussian Mixture Model based hybrid modeling algorithm is proposed for Continuous Tamil Speech Recognition. The speech sentences from various speakers are used for training and testing phase and objective measures are between the proposed and existing Continuous Speech Recognition algorithms. From the simulated results, it is observed that the proposed algorithm improves the recognition accuracy and F-measure up to 3% as compared to that of the existing algorithms for the speech signal from various speakers. In addition, it reduces the Word Error Rate, Error Rate and Error up to 4% as compared to that of the existing algorithms. In all aspects, the proposed hybrid modeling for Tamil speech recognition provides the significant improvements for speechto- text conversion in various applications.

Keywords: Speech Segmentation, Feature Extraction, Clustering, HMM, EM-GMM, CSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2139
239 Altered Network Organization in Mild Alzheimer's Disease Compared to Mild Cognitive Impairment Using Resting-State EEG

Authors: Chia-Feng Lu, Yuh-Jen Wang, Shin Teng, Yu-Te Wu, Sui-Hing Yan

Abstract:

Brain functional networks based on resting-state EEG data were compared between patients with mild Alzheimer’s disease (mAD) and matched patients with amnestic subtype of mild cognitive impairment (aMCI). We integrated the time–frequency cross mutual information (TFCMI) method to estimate the EEG functional connectivity between cortical regions and the network analysis based on graph theory to further investigate the alterations of functional networks in mAD compared with aMCI group. We aimed at investigating the changes of network integrity, local clustering, information processing efficiency, and fault tolerance in mAD brain networks for different frequency bands based on several topological properties, including degree, strength, clustering coefficient, shortest path length, and efficiency. Results showed that the disruptions of network integrity and reductions of network efficiency in mAD characterized by lower degree, decreased clustering coefficient, higher shortest path length, and reduced global and local efficiencies in the delta, theta, beta2, and gamma bands were evident. The significant changes in network organization can be used in assisting discrimination of mAD from aMCI in clinical.

Keywords: EEG, functional connectivity, graph theory, TFCMI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2505
238 On the Noise Distance in Robust Fuzzy C-Means

Authors: M. G. C. A. Cimino, G. Frosini, B. Lazzerini, F. Marcelloni

Abstract:

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. Robust fuzzy C-means (robust-FCM) is certainly one of the most known among these algorithms. In robust-FCM, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance δ from all data points. Distance δ determines the boundary of the noise cluster and therefore is a critical parameter of the algorithm. Though some approaches have been proposed to automatically determine the most suitable δ for the specific application, up to today an efficient and fully satisfactory solution does not exist. The aim of this paper is to propose a novel method to compute the optimal δ based on the analysis of the distribution of the percentage of objects assigned to the noise cluster in repeated executions of the robust-FCM with decreasing values of δ . The extremely encouraging results obtained on some data sets found in the literature are shown and discussed.

Keywords: noise prototype, robust fuzzy clustering, robustfuzzy C-means

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1822
237 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.

Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188
236 Clustering of Variables Based On a Probabilistic Approach Defined on the Hypersphere

Authors: Paulo Gomes, Adelaide Figueiredo

Abstract:

We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere Sn-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.

Keywords: Dynamic Clusters algorithm, EM algorithm, Factor analysis model, Hierarchical Clustering, Watson distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1624