Search results for: hybrid hierarchical clustering
2689 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI
Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer
Abstract:
In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting
Procedia PDF Downloads 4992688 A Comparative Analysis of Clustering Approaches for Understanding Patterns in Health Insurance Uptake: Evidence from Sociodemographic Kenyan Data
Authors: Nelson Kimeli Kemboi Yego, Juma Kasozi, Joseph Nkruzinza, Francis Kipkogei
Abstract:
The study investigated the low uptake of health insurance in Kenya despite efforts to achieve universal health coverage through various health insurance schemes. Unsupervised machine learning techniques were employed to identify patterns in health insurance uptake based on sociodemographic factors among Kenyan households. The aim was to identify key demographic groups that are underinsured and to provide insights for the development of effective policies and outreach programs. Using the 2021 FinAccess Survey, the study clustered Kenyan households based on their health insurance uptake and sociodemographic features to reveal patterns in health insurance uptake across the country. The effectiveness of k-prototypes clustering, hierarchical clustering, and agglomerative hierarchical clustering in clustering based on sociodemographic factors was compared. The k-prototypes approach was found to be the most effective at uncovering distinct and well-separated clusters in the Kenyan sociodemographic data related to health insurance uptake based on silhouette, Calinski-Harabasz, Davies-Bouldin, and Rand indices. Hence, it was utilized in uncovering the patterns in uptake. The results of the analysis indicate that inclusivity in health insurance is greatly related to affordability. The findings suggest that targeted policy interventions and outreach programs are necessary to increase health insurance uptake in Kenya, with the ultimate goal of achieving universal health coverage. The study provides important insights for policymakers and stakeholders in the health insurance sector to address the low uptake of health insurance and to ensure that healthcare services are accessible and affordable to all Kenyans, regardless of their socio-demographic status. The study highlights the potential of unsupervised machine learning techniques to provide insights into complex health policy issues and improve decision-making in the health sector.Keywords: health insurance, unsupervised learning, clustering algorithms, machine learning
Procedia PDF Downloads 942687 Fuzzy Time Series Forecasting Based on Fuzzy Logical Relationships, PSO Technique, and Automatic Clustering Algorithm
Authors: A. K. M. Kamrul Islam, Abdelhamid Bouchachia, Suang Cang, Hongnian Yu
Abstract:
Forecasting model has a great impact in terms of prediction and continues to do so into the future. Although many forecasting models have been studied in recent years, most researchers focus on different forecasting methods based on fuzzy time series to solve forecasting problems. The forecasted models accuracy fully depends on the two terms that are the length of the interval in the universe of discourse and the content of the forecast rules. Moreover, a hybrid forecasting method can be an effective and efficient way to improve forecasts rather than an individual forecasting model. There are different hybrids forecasting models which combined fuzzy time series with evolutionary algorithms, but the performances are not quite satisfactory. In this paper, we proposed a hybrid forecasting model which deals with the first order as well as high order fuzzy time series and particle swarm optimization to improve the forecasted accuracy. The proposed method used the historical enrollments of the University of Alabama as dataset in the forecasting process. Firstly, we considered an automatic clustering algorithm to calculate the appropriate interval for the historical enrollments. Then particle swarm optimization and fuzzy time series are combined that shows better forecasting accuracy than other existing forecasting models.Keywords: fuzzy time series (fts), particle swarm optimization, clustering algorithm, hybrid forecasting model
Procedia PDF Downloads 2222686 Identifying Autism Spectrum Disorder Using Optimization-Based Clustering
Authors: Sharifah Mousli, Sona Taheri, Jiayuan He
Abstract:
Autism spectrum disorder (ASD) is a complex developmental condition involving persistent difficulties with social communication, restricted interests, and repetitive behavior. The challenges associated with ASD can interfere with an affected individual’s ability to function in social, academic, and employment settings. Although there is no effective medication known to treat ASD, to our best knowledge, early intervention can significantly improve an affected individual’s overall development. Hence, an accurate diagnosis of ASD at an early phase is essential. The use of machine learning approaches improves and speeds up the diagnosis of ASD. In this paper, we focus on the application of unsupervised clustering methods in ASD as a large volume of ASD data generated through hospitals, therapy centers, and mobile applications has no pre-existing labels. We conduct a comparative analysis using seven clustering approaches such as K-means, agglomerative hierarchical, model-based, fuzzy-C-means, affinity propagation, self organizing maps, linear vector quantisation – as well as the recently developed optimization-based clustering (COMSEP-Clust) approach. We evaluate the performances of the clustering methods extensively on real-world ASD datasets encompassing different age groups: toddlers, children, adolescents, and adults. Our experimental results suggest that the COMSEP-Clust approach outperforms the other seven methods in recognizing ASD with well-separated clusters.Keywords: autism spectrum disorder, clustering, optimization, unsupervised machine learning
Procedia PDF Downloads 842685 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics
Authors: Fabio Fabris, Alex A. Freitas
Abstract:
Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification
Procedia PDF Downloads 2862684 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm
Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan
Abstract:
This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data
Procedia PDF Downloads 1932683 Analysis of Expression Data Using Unsupervised Techniques
Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe
Abstract:
his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation
Procedia PDF Downloads 1212682 A Hybrid P2P Storage Scheme Based on Erasure Coding and Replication
Authors: Usman Mahmood, Khawaja M. U. Suleman
Abstract:
A peer-to-peer storage system has challenges like; peer availability, data protection, churn rate. To address these challenges different redundancy, replacement and repair schemes are used. This paper presents a hybrid scheme of redundancy using replication and erasure coding. We calculate and compare the storage, access, and maintenance costs of our proposed scheme with existing redundancy schemes. For realistic behaviour of peers a trace of live peer-to-peer system is used. The effect of different replication, and repair schemes are also shown. The proposed hybrid scheme performs better than existing double coding hybrid scheme in all metrics and have an improved maintenance cost than hierarchical codes.Keywords: erasure coding, P2P, redundancy, replication
Procedia PDF Downloads 3672681 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning
Authors: Karthik Mittal
Abstract:
This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA
Procedia PDF Downloads 1202680 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 1502679 ACOPIN: An ACO Algorithm with TSP Approach for Clustering Proteins in Protein Interaction Networks
Authors: Jamaludin Sallim, Rozlina Mohamed, Roslina Abdul Hamid
Abstract:
In this paper, we proposed an Ant Colony Optimization (ACO) algorithm together with Traveling Salesman Problem (TSP) approach to investigate the clustering problem in Protein Interaction Networks (PIN). We named this combination as ACOPIN. The purpose of this work is two-fold. First, to test the efficacy of ACO in clustering PIN and second, to propose the simple generalization of the ACO algorithm that might allow its application in clustering proteins in PIN. We split this paper to three main sections. First, we describe the PIN and clustering proteins in PIN. Second, we discuss the steps involved in each phase of ACO algorithm. Finally, we present some results of the investigation with the clustering patterns.Keywords: ant colony optimization algorithm, searching algorithm, protein functional module, protein interaction network
Procedia PDF Downloads 5762678 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures
Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat
Abstract:
In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.Keywords: association rules, clustering, similarity measure, statistical approaches
Procedia PDF Downloads 2952677 Spectral Clustering for Manufacturing Cell Formation
Authors: Yessica Nataliani, Miin-Shen Yang
Abstract:
Cell formation (CF) is an important step in group technology. It is used in designing cellular manufacturing systems using similarities between parts in relation to machines so that it can identify part families and machine groups. There are many CF methods in the literature, but there is less spectral clustering used in CF. In this paper, we propose a spectral clustering algorithm for machine-part CF. Some experimental examples are used to illustrate its efficiency. Overall, the spectral clustering algorithm can be used in CF with a wide variety of machine/part matrices.Keywords: group technology, cell formation, spectral clustering, grouping efficiency
Procedia PDF Downloads 3792676 Clustering Based Level Set Evaluation for Low Contrast Images
Authors: Bikshalu Kalagadda, Srikanth Rangu
Abstract:
The important object of images segmentation is to extract objects with respect to some input features. One of the important methods for image segmentation is Level set method. Generally medical images and synthetic images with low contrast of pixel profile, for such images difficult to locate interested features in images. In conventional level set function, develops irregularity during its process of evaluation of contour of objects, this destroy the stability of evolution process. For this problem a remedy is proposed, a new hybrid algorithm is Clustering Level Set Evolution. Kernel fuzzy particles swarm optimization clustering with the Distance Regularized Level Set (DRLS) and Selective Binary, and Gaussian Filtering Regularized Level Set (SBGFRLS) methods are used. The ability of identifying different regions becomes easy with improved speed. Efficiency of the modified method can be evaluated by comparing with the previous method for similar specifications. Comparison can be carried out by considering medical and synthetic images.Keywords: segmentation, clustering, level set function, re-initialization, Kernel fuzzy, swarm optimization
Procedia PDF Downloads 3292675 Hierarchical Checkpoint Protocol in Data Grids
Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed
Abstract:
Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.Keywords: data grids, fault tolerance, clustering, chandy-lamport
Procedia PDF Downloads 3082674 Design of a Graphical User Interface for Data Preprocessing and Image Segmentation Process in 2D MRI Images
Authors: Enver Kucukkulahli, Pakize Erdogmus, Kemal Polat
Abstract:
The 2D image segmentation is a significant process in finding a suitable region in medical images such as MRI, PET, CT etc. In this study, we have focused on 2D MRI images for image segmentation process. We have designed a GUI (graphical user interface) written in MATLABTM for 2D MRI images. In this program, there are two different interfaces including data pre-processing and image clustering or segmentation. In the data pre-processing section, there are median filter, average filter, unsharp mask filter, Wiener filter, and custom filter (a filter that is designed by user in MATLAB). As for the image clustering, there are seven different image segmentations for 2D MR images. These image segmentation algorithms are as follows: PSO (particle swarm optimization), GA (genetic algorithm), Lloyds algorithm, k-means, the combination of Lloyds and k-means, mean shift clustering, and finally BBO (Biogeography Based Optimization). To find the suitable cluster number in 2D MRI, we have designed the histogram based cluster estimation method and then applied to these numbers to image segmentation algorithms to cluster an image automatically. Also, we have selected the best hybrid method for each 2D MR images thanks to this GUI software.Keywords: image segmentation, clustering, GUI, 2D MRI
Procedia PDF Downloads 3502673 Visualization and Performance Measure to Determine Number of Topics in Twitter Data Clustering Using Hybrid Topic Modeling
Authors: Moulana Mohammed
Abstract:
Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.Keywords: interactive visualization, visual mon-negative matrix factorization model, optimal number of topics, cluster validity indices, Twitter data clustering
Procedia PDF Downloads 1132672 Investigation of Clustering Algorithms Used in Wireless Sensor Networks
Authors: Naim Karasekreter, Ugur Fidan, Fatih Basciftci
Abstract:
Wireless sensor networks are networks in which more than one sensor node is organized among themselves. The working principle is based on the transfer of the sensed data over the other nodes in the network to the central station. Wireless sensor networks concentrate on routing algorithms, energy efficiency and clustering algorithms. In the clustering method, the nodes in the network are divided into clusters using different parameters and the most suitable cluster head is selected from among them. The data to be sent to the center is sent per cluster, and the cluster head is transmitted to the center. With this method, the network traffic is reduced and the energy efficiency of the nodes is increased. In this study, clustering algorithms were examined in terms of clustering performances and cluster head selection characteristics to try to identify weak and strong sides. This work is supported by the Project 17.Kariyer.123 of Afyon Kocatepe University BAP Commission.Keywords: wireless sensor networks (WSN), clustering algorithm, cluster head, clustering
Procedia PDF Downloads 4762671 A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters
Authors: Imèn Khanchouch, Malika Charrad, Mohamed Limam
Abstract:
The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in clustering. We focus in this paper on multi-SOM clustering method which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We then tested multi-SOM using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, Dunn index and silhouette index. The developed multi-SOM algorithm is compared to k-means and Birch methods. Results show that it is more efficient than classical clustering methods.Keywords: clustering, SOM, multi-SOM, DB index, Dunn index, silhouette index
Procedia PDF Downloads 5732670 Knowledge Discovery from Production Databases for Hierarchical Process Control
Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata
Abstract:
The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system, thus, the proposed solution has been verified. The paper documents how it is possible to apply new discovery knowledge to be used in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.Keywords: hierarchical process control, knowledge discovery from databases, neural network, process control
Procedia PDF Downloads 4542669 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects
Authors: Behnam Tavakkol
Abstract:
Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data
Procedia PDF Downloads 1812668 Why Do We Need Hierachical Linear Models?
Authors: Mustafa Aydın, Ali Murat Sunbul
Abstract:
Hierarchical or nested data structures usually are seen in many research areas. Especially, in the field of education, if we examine most of the studies, we can see the nested structures. Students in classes, classes in schools, schools in cities and cities in regions are similar nested structures. In a hierarchical structure, students being in the same class, sharing the same physical conditions and similar experiences and learning from the same teachers, they demonstrate similar behaviors between them rather than the students in other classes.Keywords: hierarchical linear modeling, nested data, hierarchical structure, data structure
Procedia PDF Downloads 6322667 Hydrothermally Fabricated 3-D Nanostructure Metal Oxide Sensors
Authors: Mohammad Alenezi
Abstract:
Hierarchical nanostructures with higher dimensionality, consisting of nanostructure building blocks such as nanowires, nanotubes, or nanosheets are very attractive. They hold great properties like the high surface-to-volume ratio and well-ordered porous structures, which can be very challenging to attain for other mono-morphological nanostructures. Well-ordered hierarchical nanostructures with high surface-to-volume ratios facilitate gas diffusion into their surfaces as well as scattering of light. Therefore, hierarchical nanostructures are expected to perform highly as gas sensors. A multistage controlled hydrothermal synthesis method to fabricate high-performance single ZnO brushlike hierarchical nanostructure gas sensor from initial nanowires is reported. The performance of the sensor based on brush-like hierarchical nanostructure is analyzed and compared to that of a nanowire gas sensor. The hierarchical gas sensor demonstrated high sensitivity toward low concentration of acetone at high speed of response. The enhancement in the hierarchical sensor performance is attributed to the increased surface to volume ratio, reduction in dimensionality of the nanowire building blocks, formation of junctions between the initial nanowire and the secondary nanowires, and enhanced gas diffusion into the surfaces of the hierarchical nanostructures.Keywords: metal oxide, nanostructure, hydrothermal, sensor
Procedia PDF Downloads 2472666 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning
Authors: Rik van Leeuwen, Ger Koole
Abstract:
Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.Keywords: hierarchical cluster analysis, hospitality, market segmentation
Procedia PDF Downloads 802665 Improved K-Means Clustering Algorithm Using RHadoop with Combiner
Authors: Ji Eun Shin, Dong Hoon Lim
Abstract:
Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.Keywords: big data, combiner, K-means clustering, RHadoop
Procedia PDF Downloads 4042664 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization
Authors: K. Umbleja, M. Ichino, H. Yaguchi
Abstract:
In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data
Procedia PDF Downloads 1412663 Developing NAND Flash-Memory SSD-Based File System Design
Authors: Jaechun No
Abstract:
This paper focuses on I/O optimizations of N-hybrid (New-Form of hybrid), which provides a hybrid file system space constructed on SSD and HDD. Although the promising potentials of SSD, such as the absence of mechanical moving overhead and high random I/O throughput, have drawn a lot of attentions from IT enterprises, its high ratio of cost/capacity makes it less desirable to build a large-scale data storage subsystem composed of only SSDs. In this paper, we present N-hybrid that attempts to integrate the strengths of SSD and HDD, to offer a single, large hybrid file system space. Several experiments were conducted to verify the performance of N-hybrid.Keywords: SSD, data section, I/O optimizations, hybrid system
Procedia PDF Downloads 3952662 An E-Assessment Website to Implement Hierarchical Aggregate Assessment
Authors: M. Lesage, G. Raîche, M. Riopel, F. Fortin, D. Sebkhi
Abstract:
This paper describes a Web server implementation of the hierarchical aggregate assessment process in the field of education. This process describes itself as a field of teamwork assessment where teams can have multiple levels of hierarchy and supervision. This process is applied everywhere and is part of the management, education, assessment and computer science fields. The E-Assessment website named “Cluster” records in its database the students, the course material, the teams and the hierarchical relationships between the students. For the present research, the hierarchical relationships are team member, team leader and group administrator appointments. The group administrators have the responsibility to supervise team leaders. The experimentation of the application has been performed by high school students in geology courses and Canadian army cadets for navigation patrols in teams. This research extends the work of Nance that uses a hierarchical aggregation process similar as the one implemented in the “Cluster” application.Keywords: e-learning, e-assessment, teamwork assessment, hierarchical aggregate assessment
Procedia PDF Downloads 3482661 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting
Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi
Abstract:
An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power
Procedia PDF Downloads 3812660 Application of Data Mining for Aquifer Environmental Assessment
Authors: Saman Javadi, Mehdi Hashemy, Mohahammad Mahmoodi
Abstract:
Vulnerability maps are employed as an important solution in order to handle entrance of pollution into the aquifers. The common way to provide vulnerability map is DRASTIC. Meanwhile, application of the method is not easy to apply for any aquifer due to choosing appropriate constant values of weights and ranks. In this study, a new approach using k-means clustering is applied to make vulnerability maps. Four features of depth to groundwater, hydraulic conductivity, recharge value and vadose zone were considered at the same time as features of clustering. Five regions are recognized out of the case study represent zones with different level of vulnerability. The finding results show that clustering provides a realistic vulnerability map so that, Pearson’s correlation coefficients between nitrate concentrations and clustering vulnerability is obtained 61%.Keywords: clustering, data mining, groundwater, vulnerability assessment
Procedia PDF Downloads 572