Search results for: Clustering technique
7047 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting
Authors: Kemal Polat
Abstract:
In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM
Procedia PDF Downloads 4167046 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design
Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai
Abstract:
Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design
Procedia PDF Downloads 4277045 Progressive Multimedia Collection Structuring via Scene Linking
Authors: Aman Berhe, Camille Guinaudeau, Claude Barras
Abstract:
In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection
Procedia PDF Downloads 1017044 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process
Procedia PDF Downloads 4067043 A Weighted K-Medoids Clustering Algorithm for Effective Stability in Vehicular Ad Hoc Networks
Authors: Rejab Hajlaoui, Tarek Moulahi, Hervé Guyennet
Abstract:
In a highway scenario, the vehicle speed can exceed 120 kmph. Therefore, any vehicle can enter or leave the network within a very short time. This mobility adversely affects the network connectivity and decreases the life time of all established links. To ensure an effective stability in vehicular ad hoc networks with minimum broadcasting storm, we have developed a weighted algorithm based on the k-medoids clustering algorithm (WKCA). Indeed, the number of clusters and the initial cluster heads will not be selected randomly as usual, but considering the available transmission range and the environment size. Then, to ensure optimal assignment of nodes to clusters in both k-medoids phases, the combined weight of any node will be computed according to additional metrics including direction, relative speed and proximity. Empirical results prove that in addition to the convergence speed that characterizes the k-medoids algorithm, our proposed model performs well both AODV-Clustering and OLSR-Clustering protocols under different densities and velocities in term of end-to-end delay, packet delivery ratio, and throughput.Keywords: communication, clustering algorithm, k-medoids, sensor, vehicular ad hoc network
Procedia PDF Downloads 2407042 Hierarchical Cluster Analysis of Raw Milk Samples Obtained from Organic and Conventional Dairy Farming in Autonomous Province of Vojvodina, Serbia
Authors: Lidija Jevrić, Denis Kučević, Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Milica Karadžić
Abstract:
In the present study, the Hierarchical Cluster Analysis (HCA) was applied in order to determine the differences between the milk samples originating from a conventional dairy farm (CF) and an organic dairy farm (OF) in AP Vojvodina, Republic of Serbia. The clustering was based on the basis of the average values of saturated fatty acids (SFA) content and unsaturated fatty acids (UFA) content obtained for every season. Therefore, the HCA included the annual SFA and UFA content values. The clustering procedure was carried out on the basis of Euclidean distances and Single linkage algorithm. The obtained dendrograms indicated that the clustering of UFA in OF was much more uniform compared to clustering of UFA in CF. In OF, spring stands out from the other months of the year. The same case can be noticed for CF, where winter is separated from the other months. The results could be expected because the composition of fatty acids content is greatly influenced by the season and nutrition of dairy cows during the year.Keywords: chemometrics, clustering, food engineering, milk quality
Procedia PDF Downloads 2817041 Sales Patterns Clustering Analysis on Seasonal Product Sales Data
Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho
Abstract:
As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.Keywords: clustering, distribution, sales pattern, seasonal product
Procedia PDF Downloads 5987040 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy
Authors: Kemal Polat
Abstract:
In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.Keywords: machine learning, data weighting, classification, data mining
Procedia PDF Downloads 3277039 GCM Based Fuzzy Clustering to Identify Homogeneous Climatic Regions of North-East India
Authors: Arup K. Sarma, Jayshree Hazarika
Abstract:
The North-eastern part of India, which receives heavier rainfall than other parts of the subcontinent, is of great concern now-a-days with regard to climate change. High intensity rainfall for short duration and longer dry spell, occurring due to impact of climate change, affects river morphology too. In the present study, an attempt is made to delineate the North-Eastern region of India into some homogeneous clusters based on the Fuzzy Clustering concept and to compare the resulting clusters obtained by using conventional methods and non conventional methods of clustering. The concept of clustering is adapted in view of the fact that, impact of climate change can be studied in a homogeneous region without much variation, which can be helpful in studies related to water resources planning and management. 10 IMD (Indian Meteorological Department) stations, situated in various regions of the North-east, have been selected for making the clusters. The results of the Fuzzy C-Means (FCM) analysis show different clustering patterns for different conditions. From the analysis and comparison it can be concluded that non conventional method of using GCM data is somehow giving better results than the others. However, further analysis can be done by taking daily data instead of monthly means to reduce the effect of standardization.Keywords: climate change, conventional and nonconventional methods of clustering, FCM analysis, homogeneous regions
Procedia PDF Downloads 3887038 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network
Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar
Abstract:
Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network
Procedia PDF Downloads 5187037 A Polynomial Time Clustering Algorithm for Solving the Assignment Problem in the Vehicle Routing Problem
Authors: Lydia Wahid, Mona F. Ahmed, Nevin Darwish
Abstract:
The vehicle routing problem (VRP) consists of a group of customers that needs to be served. Each customer has a certain demand of goods. A central depot having a fleet of vehicles is responsible for supplying the customers with their demands. The problem is composed of two subproblems: The first subproblem is an assignment problem where the number of vehicles that will be used as well as the customers assigned to each vehicle are determined. The second subproblem is the routing problem in which for each vehicle having a number of customers assigned to it, the order of visits of the customers is determined. Optimal number of vehicles, as well as optimal total distance, should be achieved. In this paper, an approach for solving the first subproblem (the assignment problem) is presented. In the approach, a clustering algorithm is proposed for finding the optimal number of vehicles by grouping the customers into clusters where each cluster is visited by one vehicle. Finding the optimal number of clusters is NP-hard. This work presents a polynomial time clustering algorithm for finding the optimal number of clusters and solving the assignment problem.Keywords: vehicle routing problems, clustering algorithms, Clarke and Wright Saving Method, agglomerative hierarchical clustering
Procedia PDF Downloads 3947036 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications
Authors: K. P. Sandesh, M. H. Suman
Abstract:
Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms
Procedia PDF Downloads 5187035 Design and Implementation of Machine Learning Model for Short-Term Energy Forecasting in Smart Home Management System
Authors: R. Ramesh, K. K. Shivaraman
Abstract:
The main aim of this paper is to handle the energy requirement in an efficient manner by merging the advanced digital communication and control technologies for smart grid applications. In order to reduce user home load during peak load hours, utility applies several incentives such as real-time pricing, time of use, demand response for residential customer through smart meter. However, this method provides inconvenience in the sense that user needs to respond manually to prices that vary in real time. To overcome these inconvenience, this paper proposes a convolutional neural network (CNN) with k-means clustering machine learning model which have ability to forecast energy requirement in short term, i.e., hour of the day or day of the week. By integrating our proposed technique with home energy management based on Bluetooth low energy provides predicted value to user for scheduling appliance in advanced. This paper describes detail about CNN configuration and k-means clustering algorithm for short-term energy forecasting.Keywords: convolutional neural network, fuzzy logic, k-means clustering approach, smart home energy management
Procedia PDF Downloads 3057034 Multi-Cluster Overlapping K-Means Extension Algorithm (MCOKE)
Authors: Said Baadel, Fadi Thabtah, Joan Lu
Abstract:
Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper, we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold to be defined as a priority which can be difficult to determine by novice users.Keywords: data mining, k-means, MCOKE, overlapping
Procedia PDF Downloads 5767033 A Clustering-Sequencing Approach to the Facility Layout Problem
Authors: Saeideh Salimpour, Sophie-Charlotte Viaux, Ahmed Azab, Mohammed Fazle Baki
Abstract:
The Facility Layout Problem (FLP) is key to the efficient and cost-effective operation of a system. This paper presents a hybrid heuristic- and mathematical-programming-based approach that divides the problem conceptually into those of clustering and sequencing. First, clusters of vertically aligned facilities are formed, which are later on sequenced horizontally. The developed methodology provides promising results in comparison to its counterparts in the literature by minimizing the inter-distances for facilities which have more interactions amongst each other and aims at placing the facilities with more interactions at the centroid of the shop.Keywords: clustering-sequencing approach, mathematical modeling, optimization, unequal facility layout problem
Procedia PDF Downloads 3337032 Ensuring Uniform Energy Consumption in Non-Deterministic Wireless Sensor Network to Protract Networks Lifetime
Authors: Vrince Vimal, Madhav J. Nigam
Abstract:
Wireless sensor networks have enticed much of the spotlight from researchers all around the world, owing to its extensive applicability in agricultural, industrial and military fields. Energy conservation node deployment stratagems play a notable role for active implementation of Wireless Sensor Networks. Clustering is the approach in wireless sensor networks which improves energy efficiency in the network. The clustering algorithm needs to have an optimum size and number of clusters, as clustering, if not implemented properly, cannot effectively increase the life of the network. In this paper, an algorithm has been proposed to address connectivity issues with the aim of ensuring the uniform energy consumption of nodes in every part of the network. The results obtained after simulation showed that the proposed algorithm has an edge over existing algorithms in terms of throughput and networks lifetime.Keywords: Wireless Sensor network (WSN), Random Deployment, Clustering, Isolated Nodes, Networks Lifetime
Procedia PDF Downloads 3377031 An Intelligent Traffic Management System Based on the WiFi and Bluetooth Sensing
Authors: Hamed Hossein Afshari, Shahrzad Jalali, Amir Hossein Ghods, Bijan Raahemi
Abstract:
This paper introduces an automated clustering solution that applies to WiFi/Bluetooth sensing data and is later used for traffic management applications. The paper initially summarizes a number of clustering approaches and thereafter shows their performance for noise removal. In this context, clustering is used to recognize WiFi and Bluetooth MAC addresses that belong to passengers traveling by a public urban transit bus. The main objective is to build an intelligent system that automatically filters out MAC addresses that belong to persons located outside the bus for different routes in the city of Ottawa. The proposed intelligent system alleviates the need for defining restrictive thresholds that however reduces the accuracy as well as the range of applicability of the solution for different routes. This paper moreover discusses the performance benefits of the presented clustering approaches in terms of the accuracy, time and space complexity, and the ease of use. Note that results of clustering can further be used for the purpose of the origin-destination estimation of individual passengers, predicting the traffic load, and intelligent management of urban bus schedules.Keywords: WiFi-Bluetooth sensing, cluster analysis, artificial intelligence, traffic management
Procedia PDF Downloads 2427030 Advances in Machine Learning and Deep Learning Techniques for Image Classification and Clustering
Authors: R. Nandhini, Gaurab Mudbhari
Abstract:
Ranging from the field of health care to self-driving cars, machine learning and deep learning algorithms have revolutionized the field with the proper utilization of images and visual-oriented data. Segmentation, regression, classification, clustering, dimensionality reduction, etc., are some of the Machine Learning tasks that helped Machine Learning and Deep Learning models to become state-of-the-art models for the field where images are key datasets. Among these tasks, classification and clustering are essential but difficult because of the intricate and high-dimensional characteristics of image data. This finding examines and assesses advanced techniques in supervised classification and unsupervised clustering for image datasets, emphasizing the relative efficiency of Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), Deep Embedded Clustering (DEC), and self-supervised learning approaches. Due to the distinctive structural attributes present in images, conventional methods often fail to effectively capture spatial patterns, resulting in the development of models that utilize more advanced architectures and attention mechanisms. In image classification, we investigated both CNNs and ViTs. One of the most promising models, which is very much known for its ability to detect spatial hierarchies, is CNN, and it serves as a core model in our study. On the other hand, ViT is another model that also serves as a core model, reflecting a modern classification method that uses a self-attention mechanism which makes them more robust as this self-attention mechanism allows them to lean global dependencies in images without relying on convolutional layers. This paper evaluates the performance of these two architectures based on accuracy, precision, recall, and F1-score across different image datasets, analyzing their appropriateness for various categories of images. In the domain of clustering, we assess DEC, Variational Autoencoders (VAEs), and conventional clustering techniques like k-means, which are used on embeddings derived from CNN models. DEC, a prominent model in the field of clustering, has gained the attention of many ML engineers because of its ability to combine feature learning and clustering into a single framework and its main goal is to improve clustering quality through better feature representation. VAEs, on the other hand, are pretty well known for using latent embeddings for grouping similar images without requiring for prior label by utilizing the probabilistic clustering method.Keywords: machine learning, deep learning, image classification, image clustering
Procedia PDF Downloads 177029 Identifying Autism Spectrum Disorder Using Optimization-Based Clustering
Authors: Sharifah Mousli, Sona Taheri, Jiayuan He
Abstract:
Autism spectrum disorder (ASD) is a complex developmental condition involving persistent difficulties with social communication, restricted interests, and repetitive behavior. The challenges associated with ASD can interfere with an affected individual’s ability to function in social, academic, and employment settings. Although there is no effective medication known to treat ASD, to our best knowledge, early intervention can significantly improve an affected individual’s overall development. Hence, an accurate diagnosis of ASD at an early phase is essential. The use of machine learning approaches improves and speeds up the diagnosis of ASD. In this paper, we focus on the application of unsupervised clustering methods in ASD as a large volume of ASD data generated through hospitals, therapy centers, and mobile applications has no pre-existing labels. We conduct a comparative analysis using seven clustering approaches such as K-means, agglomerative hierarchical, model-based, fuzzy-C-means, affinity propagation, self organizing maps, linear vector quantisation – as well as the recently developed optimization-based clustering (COMSEP-Clust) approach. We evaluate the performances of the clustering methods extensively on real-world ASD datasets encompassing different age groups: toddlers, children, adolescents, and adults. Our experimental results suggest that the COMSEP-Clust approach outperforms the other seven methods in recognizing ASD with well-separated clusters.Keywords: autism spectrum disorder, clustering, optimization, unsupervised machine learning
Procedia PDF Downloads 1187028 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators
Authors: Nur Aziza Luxfiati
Abstract:
Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development
Procedia PDF Downloads 1587027 Image Segmentation Techniques: Review
Authors: Lindani Mbatha, Suvendi Rimer, Mpho Gololo
Abstract:
Image segmentation is the process of dividing an image into several sections, such as the object's background and the foreground. It is a critical technique in both image-processing tasks and computer vision. Most of the image segmentation algorithms have been developed for gray-scale images and little research and algorithms have been developed for the color images. Most image segmentation algorithms or techniques vary based on the input data and the application. Nearly all of the techniques are not suitable for noisy environments. Most of the work that has been done uses the Markov Random Field (MRF), which involves the computations and is said to be robust to noise. In the past recent years' image segmentation has been brought to tackle problems such as easy processing of an image, interpretation of the contents of an image, and easy analysing of an image. This article reviews and summarizes some of the image segmentation techniques and algorithms that have been developed in the past years. The techniques include neural networks (CNN), edge-based techniques, region growing, clustering, and thresholding techniques and so on. The advantages and disadvantages of medical ultrasound image segmentation techniques are also discussed. The article also addresses the applications and potential future developments that can be done around image segmentation. This review article concludes with the fact that no technique is perfectly suitable for the segmentation of all different types of images, but the use of hybrid techniques yields more accurate and efficient results.Keywords: clustering-based, convolution-network, edge-based, region-growing
Procedia PDF Downloads 987026 Spectral Clustering from the Discrepancy View and Generalized Quasirandomness
Authors: Marianna Bolla
Abstract:
The aim of this paper is to compare spectral, discrepancy, and degree properties of expanding graph sequences. As we can prove equivalences and implications between them and the definition of the generalized (multiclass) quasirandomness of Lovasz–Sos (2008), they can be regarded as generalized quasirandom properties akin to the equivalent quasirandom properties of the seminal Chung-Graham-Wilson paper (1989) in the one-class scenario. Since these properties are valid for deterministic graph sequences, irrespective of stochastic models, the partial implications also justify for low-dimensional embedding of large-scale graphs and for discrepancy minimizing spectral clustering.Keywords: generalized random graphs, multiway discrepancy, normalized modularity spectra, spectral clustering
Procedia PDF Downloads 1987025 A Clustering Algorithm for Massive Texts
Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen
Abstract:
Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process
Procedia PDF Downloads 4367024 Effect of Bi-Dispersity on Particle Clustering in Sedimentation
Authors: Ali Abbas Zaidi
Abstract:
In free settling or sedimentation, particles form clusters at high Reynolds number and dilute suspensions. It is due to the entrapment of particles in the wakes of upstream particles. In this paper, the effect of bi-dispersity of settling particles on particle clustering is investigated using particle-resolved direct numerical simulation. Immersed boundary method is used for particle fluid interactions and discrete element method is used for particle-particle interactions. The solid volume fraction used in the simulation is 1% and the Reynolds number based on Sauter mean diameter is 350. Both solid volume fraction and Reynolds number lie in the clustering regime of sedimentation. In simulations, the particle diameter ratio (i.e. diameter of larger particle to smaller particle (d₁/d₂)) is varied from 2:1, 3:1 and 4:1. For each case of particle diameter ratio, solid volume fraction for each particle size (φ₁/φ₂) is varied from 1:1, 1:2 and 2:1. For comparison, simulations are also performed for monodisperse particles. For studying particles clustering, radial distribution function and instantaneous location of particles in the computational domain are studied. It is observed that the degree of particle clustering decreases with the increase in the bi-dispersity of settling particles. The smallest degree of particle clustering or dispersion of particles is observed for particles with d₁/d₂ equal to 4:1 and φ₁/φ₂ equal to 1:2. Simulations showed that the reduction in particle clustering by increasing bi-dispersity is due to the difference in settling velocity of particles. Particles with larger size settle faster and knockout the smaller particles from clustered regions of particles in the computational domain.Keywords: dispersion in bi-disperse settling particles, particle microstructures in bi-disperse suspensions, particle resolved direct numerical simulations, settling of bi-disperse particles
Procedia PDF Downloads 2087023 A Learning-Based EM Mixture Regression Algorithm
Authors: Yi-Cheng Tian, Miin-Shen Yang
Abstract:
The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model
Procedia PDF Downloads 5107022 Communication of Sensors in Clustering for Wireless Sensor Networks
Authors: Kashish Sareen, Jatinder Singh Bal
Abstract:
The use of wireless sensor networks (WSNs) has grown vastly in the last era, pointing out the crucial need for scalable and energy-efficient routing and data gathering and aggregation protocols in corresponding large-scale environments. Wireless Sensor Networks have now recently emerged as a most important computing platform and continue to grow in diverse areas to provide new opportunities for networking and services. However, the energy constrained and limited computing resources of the sensor nodes present major challenges in gathering data. The sensors collect data about their surrounding and forward it to a command centre through a base station. The past few years have witnessed increased interest in the potential use of wireless sensor networks (WSNs) as they are very useful in target detecting and other applications. However, hierarchical clustering protocols have maximum been used in to overall system lifetime, scalability and energy efficiency. In this paper, the state of the art in corresponding hierarchical clustering approaches for large-scale WSN environments is shown.Keywords: clustering, DLCC, MLCC, wireless sensor networks
Procedia PDF Downloads 4837021 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks
Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous
Abstract:
To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique
Procedia PDF Downloads 2257020 Interpretation and Clustering Framework for Analyzing ECG Survey Data
Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif
Abstract:
As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix
Procedia PDF Downloads 4727019 A Comparison of South East Asian Face Emotion Classification based on Optimized Ellipse Data Using Clustering Technique
Authors: M. Karthigayan, M. Rizon, Sazali Yaacob, R. Nagarajan, M. Muthukumaran, Thinaharan Ramachandran, Sargunam Thirugnanam
Abstract:
In this paper, using a set of irregular and regular ellipse fitting equations using Genetic algorithm (GA) are applied to the lip and eye features to classify the human emotions. Two South East Asian (SEA) faces are considered in this work for the emotion classification. There are six emotions and one neutral are considered as the output. Each subject shows unique characteristic of the lip and eye features for various emotions. GA is adopted to optimize irregular ellipse characteristics of the lip and eye features in each emotion. That is, the top portion of lip configuration is a part of one ellipse and the bottom of different ellipse. Two ellipse based fitness equations are proposed for the lip configuration and relevant parameters that define the emotions are listed. The GA method has achieved reasonably successful classification of emotion. In some emotions classification, optimized data values of one emotion are messed or overlapped to other emotion ranges. In order to overcome the overlapping problem between the emotion optimized values and at the same time to improve the classification, a fuzzy clustering method (FCM) of approach has been implemented to offer better classification. The GA-FCM approach offers a reasonably good classification within the ranges of clusters and it had been proven by applying to two SEA subjects and have improved the classification rate.Keywords: ellipse fitness function, genetic algorithm, emotion recognition, fuzzy clustering
Procedia PDF Downloads 5517018 An Extraction of Cancer Region from MR Images Using Fuzzy Clustering Means and Morphological Operations
Authors: Ramandeep Kaur, Gurjit Singh Bhathal
Abstract:
Cancer diagnosis is very difficult task. Magnetic resonance imaging (MRI) scan is used to produce image of any part of the body and provides an efficient way for diagnosis of cancer or tumor. In existing method, fuzzy clustering mean (FCM) is used for the diagnosis of the tumor. In the proposed method FCM is used to diagnose the cancer of the foot. FCM finds the centroids of the clusters of the foot cancer obtained from MRI images. FCM thresholding result shows the extract region of the cancer. Morphological operations are applied to get extracted region of cancer.Keywords: magnetic resonance imaging (MRI), fuzzy C mean clustering, segmentation, morphological operations
Procedia PDF Downloads 401