Search results for: documents clustering
1488 A Non-parametric Clustering Approach for Multivariate Geostatistical Data
Authors: Francky Fouedjio
Abstract:
Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.Keywords: clustering, geostatistics, multivariate data, non-parametric
Procedia PDF Downloads 4761487 Power Iteration Clustering Based on Deflation Technique on Large Scale Graphs
Authors: Taysir Soliman
Abstract:
One of the current popular clustering techniques is Spectral Clustering (SC) because of its advantages over conventional approaches such as hierarchical clustering, k-means, etc. and other techniques as well. However, one of the disadvantages of SC is the time consuming process because it requires computing the eigenvectors. In the past to overcome this disadvantage, a number of attempts have been proposed such as the Power Iteration Clustering (PIC) technique, which is one of versions from SC; some of PIC advantages are: 1) its scalability and efficiency, 2) finding one pseudo-eigenvectors instead of computing eigenvectors, and 3) linear combination of the eigenvectors in linear time. However, its worst disadvantage is an inter-class collision problem because it used only one pseudo-eigenvectors which is not enough. Previous researchers developed Deflation-based Power Iteration Clustering (DPIC) to overcome problems of PIC technique on inter-class collision with the same efficiency of PIC. In this paper, we developed Parallel DPIC (PDPIC) to improve the time and memory complexity which is run on apache spark framework using sparse matrix. To test the performance of PDPIC, we compared it to SC, ESCG, ESCALG algorithms on four small graph benchmark datasets and nine large graph benchmark datasets, where PDPIC proved higher accuracy and better time consuming than other compared algorithms.Keywords: spectral clustering, power iteration clustering, deflation-based power iteration clustering, Apache spark, large graph
Procedia PDF Downloads 1881486 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures
Authors: Salima Kouici, Abdelkader Khelladi
Abstract:
In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering
Procedia PDF Downloads 4791485 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering
Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining
Abstract:
DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)
Procedia PDF Downloads 2741484 K-Means Clustering-Based Infinite Feature Selection Method
Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi
Abstract:
Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.Keywords: feature selection, infinite feature selection, clustering, graph
Procedia PDF Downloads 1251483 Finding Related Scientific Documents Using Formal Concept Analysis
Authors: Nadeem Akhtar, Hira Javed
Abstract:
An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice
Procedia PDF Downloads 3301482 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition
Authors: L. Hamsaveni, Navya Prakash, Suresha
Abstract:
Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format
Procedia PDF Downloads 3751481 An Improved K-Means Algorithm for Gene Expression Data Clustering
Authors: Billel Kenidra, Mohamed Benmohammed
Abstract:
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization
Procedia PDF Downloads 1891480 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami
Abstract:
Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means
Procedia PDF Downloads 2581479 Generalization of Clustering Coefficient on Lattice Networks Applied to Criminal Networks
Authors: Christian H. Sanabria-Montaña, Rodrigo Huerta-Quintanilla
Abstract:
A lattice network is a special type of network in which all nodes have the same number of links, and its boundary conditions are periodic. The most basic lattice network is the ring, a one-dimensional network with periodic border conditions. In contrast, the Cartesian product of d rings forms a d-dimensional lattice network. An analytical expression currently exists for the clustering coefficient in this type of network, but the theoretical value is valid only up to certain connectivity value; in other words, the analytical expression is incomplete. Here we obtain analytically the clustering coefficient expression in d-dimensional lattice networks for any link density. Our analytical results show that the clustering coefficient for a lattice network with density of links that tend to 1, leads to the value of the clustering coefficient of a fully connected network. We developed a model on criminology in which the generalized clustering coefficient expression is applied. The model states that delinquents learn the know-how of crime business by sharing knowledge, directly or indirectly, with their friends of the gang. This generalization shed light on the network properties, which is important to develop new models in different fields where network structure plays an important role in the system dynamic, such as criminology, evolutionary game theory, econophysics, among others.Keywords: clustering coefficient, criminology, generalized, regular network d-dimensional
Procedia PDF Downloads 4111478 A Relative Entropy Regularization Approach for Fuzzy C-Means Clustering Problem
Authors: Ouafa Amira, Jiangshe Zhang
Abstract:
Clustering is an unsupervised machine learning technique; its aim is to extract the data structures, in which similar data objects are grouped in the same cluster, whereas dissimilar objects are grouped in different clusters. Clustering methods are widely utilized in different fields, such as: image processing, computer vision , and pattern recognition, etc. Fuzzy c-means clustering (fcm) is one of the most well known fuzzy clustering methods. It is based on solving an optimization problem, in which a minimization of a given cost function has been studied. This minimization aims to decrease the dissimilarity inside clusters, where the dissimilarity here is measured by the distances between data objects and cluster centers. The degree of belonging of a data point in a cluster is measured by a membership function which is included in the interval [0, 1]. In fcm clustering, the membership degree is constrained with the condition that the sum of a data object’s memberships in all clusters must be equal to one. This constraint can cause several problems, specially when our data objects are included in a noisy space. Regularization approach took a part in fuzzy c-means clustering technique. This process introduces an additional information in order to solve an ill-posed optimization problem. In this study, we focus on regularization by relative entropy approach, where in our optimization problem we aim to minimize the dissimilarity inside clusters. Finding an appropriate membership degree to each data object is our objective, because an appropriate membership degree leads to an accurate clustering result. Our clustering results in synthetic data sets, gaussian based data sets, and real world data sets show that our proposed model achieves a good accuracy.Keywords: clustering, fuzzy c-means, regularization, relative entropy
Procedia PDF Downloads 2581477 Visualization and Performance Measure to Determine Number of Topics in Twitter Data Clustering Using Hybrid Topic Modeling
Authors: Moulana Mohammed
Abstract:
Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.Keywords: interactive visualization, visual mon-negative matrix factorization model, optimal number of topics, cluster validity indices, Twitter data clustering
Procedia PDF Downloads 1331476 Max-Entropy Feed-Forward Clustering Neural Network
Authors: Xiaohan Bookman, Xiaoyan Zhu
Abstract:
The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution under some limited conditions. As this paper defines two processes in Feed-Forward Neural Network, our limited condition is the abstracted features of samples which are worked out in the abstraction process. And the final outputs are the probability distribution for different clusters in the clustering process. As Entropy-Based Principle is considered into the feed-forward neural network, a clustering method is born. We have conducted some experiments on six open UCI data sets, comparing with a few baselines and applied purity as the measurement. The results illustrate that our method outperforms all the other baselines that are most popular clustering methods.Keywords: feed-forward neural network, clustering, max-entropy principle, probabilistic models
Procedia PDF Downloads 4331475 Clustering of Extremes in Financial Returns: A Comparison between Developed and Emerging Markets
Authors: Sara Ali Alokley, Mansour Saleh Albarrak
Abstract:
This paper investigates the dependency or clustering of extremes in the financial returns data by estimating the extremal index value θ∈[0,1]. The smaller the value of θ the more clustering we have. Here we apply the method of Ferro and Segers (2003) to estimate the extremal index for a range of threshold values. We compare the dependency structure of extremes in the developed and emerging markets. We use the financial returns of the stock market index in the developed markets of US, UK, France, Germany and Japan and the emerging markets of Brazil, Russia, India, China and Saudi Arabia. We expect that more clustering occurs in the emerging markets. This study will help to understand the dependency structure of the financial returns data.Keywords: clustring, extremes, returns, dependency, extermal index
Procedia PDF Downloads 4041474 An Energy Efficient Clustering Approach for Underwater Wireless Sensor Networks
Authors: Mohammad Reza Taherkhani
Abstract:
Wireless sensor networks that are used to monitor a special environment, are formed from a large number of sensor nodes. The role of these sensors is to sense special parameters from ambient and to make a connection. In these networks, the most important challenge is the management of energy usage. Clustering is one of the methods that are broadly used to face this challenge. In this paper, a distributed clustering protocol based on learning automata is proposed for underwater wireless sensor networks. The proposed algorithm that is called LA-Clustering forms clusters in the same energy level, based on the energy level of nodes and the connection radius regardless of size and the structure of sensor network. The proposed approach is simulated and is compared with some other protocols with considering some metrics such as network lifetime, number of alive nodes, and number of transmitted data. The simulation results demonstrate the efficiency of the proposed approach.Keywords: underwater sensor networks, clustering, learning automata, energy consumption
Procedia PDF Downloads 3611473 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.Keywords: clustering, data analysis, data mining, predictive models
Procedia PDF Downloads 4641472 The Platform for Digitization of Georgian Documents
Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili
Abstract:
Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.Keywords: NLP, OCR, BERT, Kubernetes, transformers
Procedia PDF Downloads 1401471 A Learning Automata Based Clustering Approach for Underwater Sensor Networks to Reduce Energy Consumption
Authors: Motahareh Fadaei
Abstract:
Wireless sensor networks that are used to monitor a special environment, are formed from a large number of sensor nodes. The role of these sensors is to sense special parameters from ambient and to make connection. In these networks, the most important challenge is the management of energy usage. Clustering is one of the methods that are broadly used to face this challenge. In this paper, a distributed clustering protocol based on learning automata is proposed for underwater wireless sensor networks. The proposed algorithm that is called LA-Clustering forms clusters in the same energy level, based on the energy level of nodes and the connection radius regardless of size and the structure of sensor network. The proposed approach is simulated and is compared with some other protocols with considering some metrics such as network lifetime, number of alive nodes, and number of transmitted data. The simulation results demonstrate the efficiency of the proposed approach.Keywords: clustering, energy consumption, learning automata, underwater sensor networks
Procedia PDF Downloads 3141470 Knowledge Representation Based on Interval Type-2 CFCM Clustering
Authors: Lee Myung-Won, Kwak Keun-Chang
Abstract:
This paper is concerned with knowledge representation and extraction of fuzzy if-then rules using Interval Type-2 Context-based Fuzzy C-Means clustering (IT2-CFCM) with the aid of fuzzy granulation. This proposed clustering algorithm is based on information granulation in the form of IT2 based Fuzzy C-Means (IT2-FCM) clustering and estimates the cluster centers by preserving the homogeneity between the clustered patterns from the IT2 contexts produced in the output space. Furthermore, we can obtain the automatic knowledge representation in the design of Radial Basis Function Networks (RBFN), Linguistic Model (LM), and Adaptive Neuro-Fuzzy Networks (ANFN) from the numerical input-output data pairs. We shall focus on a design of ANFN in this paper. The experimental results on an estimation problem of energy performance reveal that the proposed method showed a good knowledge representation and performance in comparison with the previous works.Keywords: IT2-FCM, IT2-CFCM, context-based fuzzy clustering, adaptive neuro-fuzzy network, knowledge representation
Procedia PDF Downloads 3211469 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting
Authors: Kemal Polat
Abstract:
In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM
Procedia PDF Downloads 4121468 System of Quality Automation for Documents (SQAD)
Authors: R. Babi Saraswathi, K. Divya, A. Habeebur Rahman, D. B. Hari Prakash, S. Jayanth, T. Kumar, N. Vijayarangan
Abstract:
Document automation is the design of systems and workflows, assembling repetitive documents to meet the specific business needs. In any organization or institution, documenting employee’s information is very important for both employees as well as management. It shows an individual’s progress to the management. Many documents of the employee are in the form of papers, so it is very difficult to arrange and for future reference we need to spend more time in getting the exact document. Also, it is very tedious to generate reports according to our needs. The process gets even more difficult on getting approvals and hence lacks its security aspects. This project overcomes the above-stated issues. By storing the details in the database and maintaining the e-documents, the automation system reduces the manual work to a large extent. Then the approval process of some important documents can be done in a much-secured manner by using Digital Signature and encryption techniques. Details are maintained in the database and e-documents are stored in specific folders and generation of various kinds of reports is possible. Moreover, an efficient search method is implemented is used in the database. Automation supporting document maintenance in many aspects is useful for minimize data entry, reduce the time spent on proof-reading, avoids duplication, and reduce the risks associated with the manual error, etc.Keywords: e-documents, automation, digital signature, encryption
Procedia PDF Downloads 3891467 Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design
Authors: Shen Jian, Hu Jie, Ma Jin, Peng Ying Hong, Fang Yi, Liu Wen Hai
Abstract:
Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.Keywords: knowledge clustering, knowledge acquisition, knowledge based engineering, knowledge cell, biologically inspired design
Procedia PDF Downloads 4241466 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process
Procedia PDF Downloads 4011465 A Weighted K-Medoids Clustering Algorithm for Effective Stability in Vehicular Ad Hoc Networks
Authors: Rejab Hajlaoui, Tarek Moulahi, Hervé Guyennet
Abstract:
In a highway scenario, the vehicle speed can exceed 120 kmph. Therefore, any vehicle can enter or leave the network within a very short time. This mobility adversely affects the network connectivity and decreases the life time of all established links. To ensure an effective stability in vehicular ad hoc networks with minimum broadcasting storm, we have developed a weighted algorithm based on the k-medoids clustering algorithm (WKCA). Indeed, the number of clusters and the initial cluster heads will not be selected randomly as usual, but considering the available transmission range and the environment size. Then, to ensure optimal assignment of nodes to clusters in both k-medoids phases, the combined weight of any node will be computed according to additional metrics including direction, relative speed and proximity. Empirical results prove that in addition to the convergence speed that characterizes the k-medoids algorithm, our proposed model performs well both AODV-Clustering and OLSR-Clustering protocols under different densities and velocities in term of end-to-end delay, packet delivery ratio, and throughput.Keywords: communication, clustering algorithm, k-medoids, sensor, vehicular ad hoc network
Procedia PDF Downloads 2381464 Hierarchical Cluster Analysis of Raw Milk Samples Obtained from Organic and Conventional Dairy Farming in Autonomous Province of Vojvodina, Serbia
Authors: Lidija Jevrić, Denis Kučević, Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Milica Karadžić
Abstract:
In the present study, the Hierarchical Cluster Analysis (HCA) was applied in order to determine the differences between the milk samples originating from a conventional dairy farm (CF) and an organic dairy farm (OF) in AP Vojvodina, Republic of Serbia. The clustering was based on the basis of the average values of saturated fatty acids (SFA) content and unsaturated fatty acids (UFA) content obtained for every season. Therefore, the HCA included the annual SFA and UFA content values. The clustering procedure was carried out on the basis of Euclidean distances and Single linkage algorithm. The obtained dendrograms indicated that the clustering of UFA in OF was much more uniform compared to clustering of UFA in CF. In OF, spring stands out from the other months of the year. The same case can be noticed for CF, where winter is separated from the other months. The results could be expected because the composition of fatty acids content is greatly influenced by the season and nutrition of dairy cows during the year.Keywords: chemometrics, clustering, food engineering, milk quality
Procedia PDF Downloads 2781463 Sales Patterns Clustering Analysis on Seasonal Product Sales Data
Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho
Abstract:
As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.Keywords: clustering, distribution, sales pattern, seasonal product
Procedia PDF Downloads 5951462 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy
Authors: Kemal Polat
Abstract:
In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.Keywords: machine learning, data weighting, classification, data mining
Procedia PDF Downloads 3241461 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach
Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi
Abstract:
Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology
Procedia PDF Downloads 3471460 Clustering the Wheat Seeds Using SOM Artificial Neural Networks
Authors: Salah Ghamari
Abstract:
In this study, the ability of self organizing map artificial (SOM) neural networks in clustering the wheat seeds varieties according to morphological properties of them was considered. The SOM is one type of unsupervised competitive learning. Experimentally, five morphological features of 300 seeds (including three varieties: gaskozhen, Md and sardari) were obtained using image processing technique. The results show that the artificial neural network has a good performance (90.33% accuracy) in classification of the wheat varieties despite of high similarity in them. The highest classification accuracy (100%) was achieved for sardari.Keywords: artificial neural networks, clustering, self organizing map, wheat variety
Procedia PDF Downloads 6551459 GCM Based Fuzzy Clustering to Identify Homogeneous Climatic Regions of North-East India
Authors: Arup K. Sarma, Jayshree Hazarika
Abstract:
The North-eastern part of India, which receives heavier rainfall than other parts of the subcontinent, is of great concern now-a-days with regard to climate change. High intensity rainfall for short duration and longer dry spell, occurring due to impact of climate change, affects river morphology too. In the present study, an attempt is made to delineate the North-Eastern region of India into some homogeneous clusters based on the Fuzzy Clustering concept and to compare the resulting clusters obtained by using conventional methods and non conventional methods of clustering. The concept of clustering is adapted in view of the fact that, impact of climate change can be studied in a homogeneous region without much variation, which can be helpful in studies related to water resources planning and management. 10 IMD (Indian Meteorological Department) stations, situated in various regions of the North-east, have been selected for making the clusters. The results of the Fuzzy C-Means (FCM) analysis show different clustering patterns for different conditions. From the analysis and comparison it can be concluded that non conventional method of using GCM data is somehow giving better results than the others. However, further analysis can be done by taking daily data instead of monthly means to reduce the effect of standardization.Keywords: climate change, conventional and nonconventional methods of clustering, FCM analysis, homogeneous regions
Procedia PDF Downloads 385