Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 28139

Search results for: hierarchical cluster analysis

28139 An E-Assessment Website to Implement Hierarchical Aggregate Assessment

Authors: M. Lesage, G. Raîche, M. Riopel, F. Fortin, D. Sebkhi

Abstract:

This paper describes a Web server implementation of the hierarchical aggregate assessment process in the field of education. This process describes itself as a field of teamwork assessment where teams can have multiple levels of hierarchy and supervision. This process is applied everywhere and is part of the management, education, assessment and computer science fields. The E-Assessment website named “Cluster” records in its database the students, the course material, the teams and the hierarchical relationships between the students. For the present research, the hierarchical relationships are team member, team leader and group administrator appointments. The group administrators have the responsibility to supervise team leaders. The experimentation of the application has been performed by high school students in geology courses and Canadian army cadets for navigation patrols in teams. This research extends the work of Nance that uses a hierarchical aggregation process similar as the one implemented in the “Cluster” application.

Keywords: e-learning, e-assessment, teamwork assessment, hierarchical aggregate assessment

Procedia PDF Downloads 366

28138 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission

Procedia PDF Downloads 409

28137 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: clustering, unsupervised learning, algorithms, hierarchical

Procedia PDF Downloads 879

28136 Spatio-temporal Distribution of Surface Water Quality in the Kebir Rhumel Basin, Algeria

Authors: Lazhar Belkhiri, Ammar Tiri, Lotfi Mouni, Fatma Elhadj Lakouas

Abstract:

This research aims to present a surface water quality assessment of hydrochemical parameters in the Kebir Rhumel Basin, Algeria. The water quality index (WQI), Mann–Kendall (MK) test, and hierarchical cluster analysis (HCA) were used in oder to understand the spatio-temporal distribution of the surface water quality in the study area. Eleven hydrochemical parameters were measured monthly at eight stations from January 2016 to December 2020. The dominant cation in the surface water was found to be calcium, followed by sodium, and the dominant anion was sulfate, followed by chloride. In terms of WQI, a significant percentage of surface water samples at stations Ain Smara (AS), Beni Haroune (BH), Grarem (GR), and Sidi Khlifa (SK) exhibited poor water quality, with approximately 89.5%, 90.6%, 78.2%, and 62.7%, respectively, falling into this category. Mann–Kendall trend analysis revealed a significantly increasing trend in WQI values at stations Oued Boumerzoug (ON) and SK, indicating that the temporal variation of WQI in these stations is significant. Hierarchical clustering analysis classified the data into three clusters. The first cluster contained approximately 22% of the total number of months, the second cluster included about 30%, and the third cluster had the highest representation, approximately 48% of the total number of months. Within these clusters, certain stations exhibited higher WQI values. In the first cluster, stations GR and ON had the highest WQI values. In the second cluster, stations Oued Boumerzoug (OB) and SK showed the highest WQI values, while in the last cluster, stations AS, BH, El Milia (EM), and Hammam Grouz (HG) had the highest mean WQI values. Also, approximately 38%, 41%, and 38% of the total water samples in the first, second, and third clusters, respectively, were classified as having poor water quality. The findings of this study can serve as a scientific basis for decision-makers to formulate strategies for surface water quality restoration and management in the region.

Keywords: surface water, water quality index (WQI), Mann Kendall (MK) test, hierarchical cluster analysis (HCA), spatial-temporal distribution, Kebir Rhumel Basin

Procedia PDF Downloads 11

28135 Clustering Locations of Textile and Garment Industries to Compare with the Future Industrial Cluster in Thailand

Authors: Kanogkan Leerojanaprapa

Abstract:

Textile and garment industry is used to a major exporting industry of Thailand. According to lacking of the nation's price-competitiveness by stopping the EU's GSP (Generalised Scheme of Preferences) and ‘Nationwide Minimum Wage Policy’ that Thailand’s employers must pay all employees at least 300 baht (about $10) a day, the supply chains of the Thai textile and garment industry is affected and need to be reformed. Therefore, either Thai textile or garment industry will be existed or not would be concerned. This is also challenged for the government to decide which industries should be promoted the future industries of Thailand. Recently Thai government launch The Cluster-based Special Economic Development Zones Policy for promoting business cluster (effect on September 16, 2015). They define a cluster as the concentration of interconnected businesses and related institutions that operate within the same geographic areas and textiles and garment is one of target industrial clusters and 9 provinces are targeted (Bangkok, Kanchanaburi, Nakhon Pathom, Ratchaburi, Samut Sakhon, Chonburi, Chachoengsao, Prachinburi, and Sa Kaeo). The cluster zone are defined to link west-east corridor connected to manufacturing source in Cambodia and Mynmar to Bangkok where are promoted to be design, sourcing, and trading hub. The Thai government will provide tax and non-tax incentives for targeted industries within the clusters and expects these businesses are scattered to where they can get the most benefit which will identify future industrial cluster. This research will show the difference between the current cluster and future cluster following the target provinces of the textile and garment. The current cluster is analysed from secondary data. The four characteristics of the numbers of plants in Spinning, weaving and finishing of textiles, Manufacture of made-up textile articles, except apparel, Manufacture of knitted and crocheted fabrics, and Manufacture of other textiles, not elsewhere classified in particular 77 provinces (in total) are clustered by K-means cluster analysis and Hierarchical Cluster Analysis. In addition, the cluster can be confirmed and showed which variables contribute the most to defined cluster solution with ANOVA test. The results of analysis can identify 22 provinces (which the textile or garment plants are located) into 3 clusters. Plants in cluster 1 tend to be large numbers of plants which is only Bangkok, Next plants in cluster 2 tend to be moderate numbers of plants which are Samut Prakan, Samut Sakhon and Nakhon Pathom. Finally plants in cluster 3 tend to be little numbers of plants which are other 18 provinces. The same methodology can be implemented in other industries for future study.

Keywords: ANOVA, hierarchical cluster analysis, industrial clusters, K -means cluster analysis, textile and garment industry

Procedia PDF Downloads 211

28134 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 145

28133 Determination of Genotypic Relationship among 12 Sugarcane (Saccharum officinarum) Varieties

Authors: Faith Eweluegim Enahoro-Ofagbe, Alika Eke Joseph

Abstract:

Information on genetic variation within a population is crucial for utilizing heterozygosity for breeding programs that aim to improve crop species. The study was conducted to ascertain the genotypic similarities among twelve sugarcane (Saccharum officinarum) varieties to group them for purposes of hybridizations for cane yield improvement. The experiment was conducted at the University of Benin, Faculty of Agriculture Teaching and Research Farm, Benin City. Twelve sugarcane varieties obtained from National Cereals Research Institute, Badeggi, Niger State, Nigeria, were planted in three replications in a randomized complete block design. Each variety was planted on a five-row plot of 5.0 m in length. Data were collected on 12 agronomic traits, including; the number of millable cane, cane girth, internode length, number of male and female flowers (fuss), days to flag leaf, days to flowering, brix%, cane yield, and others. There were significant differences, according to the findings among the twelve genotypes for the number of days to flag leaf, number of male and female flowers (fuss), and cane yield. The relationship between the twelve sugarcane varieties was expressed using hierarchical cluster analysis. The twelve genotypes were grouped into three major clusters based on hierarchical classification. Cluster I had five genotypes, cluster II had four, and cluster III had three. Cluster III was dominated by varieties characterized by higher cane yield, number of leaves, internode length, brix%, number of millable stalks, stalk/stool, cane girth, and cane length. Cluster II contained genotypes with early maturity characteristics, such as early flowering, early flag leaf development, growth rate, and the number of female and male flowers (fuss). The maximum inter-cluster distance between clusters III and I indicated higher genetic diversity between the two groups. Hybridization between the two groups could result in transgressive recombinants for agronomically important traits.

Keywords: sugarcane, Saccharum officinarum, genotype, cluster analysis, principal components analysis

Procedia PDF Downloads 74

28132 A Bayesian Hierarchical Poisson Model with an Underlying Cluster Structure for the Analysis of Measles in Colombia

Authors: Ana Corberan-Vallet, Karen C. Florez, Ingrid C. Marino, Jose D. Bermudez

Abstract:

In 2016, the Region of the Americas was declared free of measles, a viral disease that can cause severe health problems. However, since 2017, measles has reemerged in Venezuela and has subsequently reached neighboring countries. In 2018, twelve American countries reported confirmed cases of measles. Governmental and health authorities in Colombia, a country that shares the longest land boundary with Venezuela, are aware of the need for a strong response to restrict the expanse of the epidemic. In this work, we apply a Bayesian hierarchical Poisson model with an underlying cluster structure to describe disease incidence in Colombia. Concretely, the proposed methodology provides relative risk estimates at the department level and identifies clusters of disease, which facilitates the implementation of targeted public health interventions. Socio-demographic factors, such as the percentage of migrants, gross domestic product, and entry routes, are included in the model to better describe the incidence of disease. Since the model does not impose any spatial correlation at any level of the model hierarchy, it avoids the spatial confounding problem and provides a suitable framework to estimate the fixed-effect coefficients associated with spatially-structured covariates.

Keywords: Bayesian analysis, cluster identification, disease mapping, risk estimation

Procedia PDF Downloads 149

28131 Hierarchical Cluster Analysis of Raw Milk Samples Obtained from Organic and Conventional Dairy Farming in Autonomous Province of Vojvodina, Serbia

Authors: Lidija Jevrić, Denis Kučević, Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Milica Karadžić

Abstract:

In the present study, the Hierarchical Cluster Analysis (HCA) was applied in order to determine the differences between the milk samples originating from a conventional dairy farm (CF) and an organic dairy farm (OF) in AP Vojvodina, Republic of Serbia. The clustering was based on the basis of the average values of saturated fatty acids (SFA) content and unsaturated fatty acids (UFA) content obtained for every season. Therefore, the HCA included the annual SFA and UFA content values. The clustering procedure was carried out on the basis of Euclidean distances and Single linkage algorithm. The obtained dendrograms indicated that the clustering of UFA in OF was much more uniform compared to clustering of UFA in CF. In OF, spring stands out from the other months of the year. The same case can be noticed for CF, where winter is separated from the other months. The results could be expected because the composition of fatty acids content is greatly influenced by the season and nutrition of dairy cows during the year.

Keywords: chemometrics, clustering, food engineering, milk quality

Procedia PDF Downloads 273

28130 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 392

28129 Statistical Analysis to Select Evacuation Route

Authors: Zaky Musyarof, Dwi Yono Sutarto, Dwima Rindy Atika, R. B. Fajriya Hakim

Abstract:

Each country should be responsible for the safety of people, especially responsible for the safety of people living in disaster-prone areas. One of those services is provides evacuation route for them. But all this time, the selection of evacuation route is seem doesn’t well organized, it could be seen that when a disaster happen, there will be many accumulation of people on the steps of evacuation route. That condition is dangerous to people because hampers evacuation process. By some methods in Statistical analysis, author tries to give a suggestion how to prepare evacuation route which is organized and based on people habit. Those methods are association rules, sequential pattern mining, hierarchical cluster analysis and fuzzy logic.

Keywords: association rules, sequential pattern mining, cluster analysis, fuzzy logic, evacuation route

Procedia PDF Downloads 497

28128 Optimized Cluster Head Selection Algorithm Based on LEACH Protocol for Wireless Sensor Networks

Authors: Wided Abidi, Tahar Ezzedine

Abstract:

Low-Energy Adaptive Clustering Hierarchy (LEACH) has been considered as one of the effective hierarchical routing algorithms that optimize energy and prolong the lifetime of network. Since the selection of Cluster Head (CH) in LEACH is carried out randomly, in this paper, we propose an approach of electing CH based on LEACH protocol. In other words, we present a formula for calculating the threshold responsible for CH election. In fact, we adopt three principle criteria: the remaining energy of node, the number of neighbors within cluster range and the distance between node and CH. Simulation results show that our proposed approach beats LEACH protocol in regards of prolonging the lifetime of network and saving residual energy.

Keywords: wireless sensors networks, LEACH protocol, cluster head election, energy efficiency

Procedia PDF Downloads 326

28127 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.

Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis

Procedia PDF Downloads 358

28126 Cluster Analysis of Retailers’ Benefits from Their Cooperation with Manufacturers: Business Models Perspective

Authors: M. K. Witek-Hajduk, T. M. Napiórkowski

Abstract:

A number of studies discussed the topic of benefits of retailers-manufacturers cooperation and coopetition. However, there are only few publications focused on the benefits of cooperation and coopetition between retailers and their suppliers of durable consumer goods; especially in the context of business model of cooperating partners. This paper aims to provide a clustering approach to segment retailers selling consumer durables according to the benefits they obtain from their cooperation with key manufacturers and differentiate the said retailers’ in term of the business models of cooperating partners. For the purpose of the study, a survey (with a CATI method) collected data on 603 consumer durables retailers present on the Polish market. Retailers are clustered both, with hierarchical and non-hierarchical methods. Five distinctive groups of consumer durables’ retailers are (based on the studied benefits) identified using the two-stage clustering approach. The clusters are then characterized with a set of exogenous variables, key of which are business models employed by the retailer and its partnering key manufacturer. The paper finds that the a combination of a medium sized retailer classified as an Integrator with a chiefly domestic capital and a manufacturer categorized as a Market Player will yield the highest benefits. On the other side of the spectrum is medium sized Distributor retailer with solely domestic capital – in this case, the business model of the cooperating manufactrer appears to be irreleveant. This paper is the one of the first empirical study using cluster analysis on primary data that defines the types of cooperation between consumer durables’ retailers and manufacturers – their key suppliers. The analysis integrates a perspective of both retailers’ and manufacturers’ business models and matches them with individual and joint benefits.

Keywords: benefits of cooperation, business model, cluster analysis, retailer-manufacturer cooperation

Procedia PDF Downloads 253

28125 O-LEACH: The Problem of Orphan Nodes in the LEACH of Routing Protocol for Wireless Sensor Networks

Authors: Wassim Jerbi, Abderrahmen Guermazi, Hafedh Trabelsi

Abstract:

The optimum use of coverage in wireless sensor networks (WSNs) is very important. LEACH protocol called Low Energy Adaptive Clustering Hierarchy, presents a hierarchical clustering algorithm for wireless sensor networks. LEACH is a protocol that allows the formation of distributed cluster. In each cluster, LEACH randomly selects some sensor nodes called cluster heads (CHs). The selection of CHs is made with a probabilistic calculation. It is supposed that each non-CH node joins a cluster and becomes a cluster member. Nevertheless, some CHs can be concentrated in a specific part of the network. Thus, several sensor nodes cannot reach any CH. to solve this problem. We created an O-LEACH Orphan nodes protocol, its role is to reduce the sensor nodes which do not belong the cluster. The cluster member called Gateway receives messages from neighboring orphan nodes. The gateway informs CH having the neighboring nodes that not belong to any group. However, Gateway called (CH') attaches the orphaned nodes to the cluster and then collected the data. O-Leach enables the formation of a new method of cluster, leads to a long life and minimal energy consumption. Orphan nodes possess enough energy and seeks to be covered by the network. The principal novel contribution of the proposed work is O-LEACH protocol which provides coverage of the whole network with a minimum number of orphaned nodes and has a very high connectivity rates.As a result, the WSN application receives data from the entire network including orphan nodes. The proper functioning of the Application requires, therefore, management of intelligent resources present within each the network sensor. The simulation results show that O-LEACH performs better than LEACH in terms of coverage, connectivity rate, energy and scalability.

Keywords: WSNs; routing; LEACH; O-LEACH; Orphan nodes; sub-cluster; gateway; CH’

Procedia PDF Downloads 367

28124 Digital Geography and Geographic Information System in Schools: Towards a Hierarchical Geospatial Approach

Authors: Mary Fargher

Abstract:

This paper examines the opportunities of using a more hierarchical approach to geospatial enquiry in using GIS in school geography. A case is made that it is not just the lack of teacher technological knowledge that is stopping some teachers from using GIS in the classroom but that there is a gap in their understanding of how to link GIS use more specifically to the pedagogy of teaching geography with GIS. Using a hierarchical approach to geospatial enquiry as a theoretical framework, the analysis shows clearly how concepts of spatial distribution, interaction, relation, comparison, and temporal relationships can be used by teachers more explicitly to capitalise on the analytical power of GIS and to construct what can be interpreted as powerful geographical knowledge. An exemplar illustrating this approach on the topic of geo-hazards is then presented for critical analysis and discussion. Recommendations are then made for a model of progression for geography teacher education with GIS through hierarchical geospatial enquiry that takes into account beginner, intermediate, and more advanced users.

Keywords: digital geography, GIS, education, hierarchical geospatial enquiry, powerful geographical knowledge

Procedia PDF Downloads 144

28123 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Authors: Fabio Fabris, Alex A. Freitas

Abstract:

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.

Keywords: algorithm recommendation, meta-learning, bioinformatics, hierarchical classification

Procedia PDF Downloads 305

28122 The Use of Multivariate Statistical and GIS for Characterization Groundwater Quality in Laghouat Region, Algeria

Authors: Rouighi Mustapha, Bouzid Laghaa Souad, Rouighi Tahar

Abstract:

Due to rain Shortage and the increase of population in the last years, wells excavation and groundwater use for different purposes had been increased without any planning. This is a great challenge for our country. Moreover, this scarcity of water resources in this region is unfortunately combined with rapid fresh water resources quality deterioration, due to salinity and contamination processes. Therefore, it is necessary to conduct the studies about groundwater quality in Algeria. In this work consists in the identification of the factors which influence the water quality parameters in Laghouat region by using statistical analysis Principal Component Analysis (PCA), Hierarchical Cluster Analysis (HCA) and geographic information system (GIS) in an attempt to discriminate the sources of the variation of water quality variations. The results of PCA technique indicate that variables responsible for water quality composition are mainly related to soluble salts variables; natural processes and the nature of the rock which modifies significantly the water chemistry. Inferred from the positive correlation between K+ and NO3-, NO3- is believed to be human induced rather than naturally originated. In this study, the multivariate statistical analysis and GIS allows the hydrogeologist to have supplementary tools in the characterization and evaluating of aquifers.

Keywords: cluster, analysis, GIS, groundwater, laghouat, quality

Procedia PDF Downloads 321

28121 Cluster Analysis of Students’ Learning Satisfaction

Authors: Purevdolgor Luvsantseren, Ajnai Luvsan-Ish, Oyuntsetseg Sandag, Javzmaa Tsend, Akhit Tileubai, Baasandorj Chilhaasuren, Jargalbat Puntsagdash, Galbadrakh Chuluunbaatar

Abstract:

One of the indicators of the quality of university services is student satisfaction. Aim: We aimed to study the level of satisfaction of students in the first year of premedical courses in the course of Medical Physics using the cluster method. Materials and Methods: In the framework of this goal, a questionnaire was collected from a total of 324 students who studied the medical physics course of the 1st course of the premedical course at the Mongolian National University of Medical Sciences. When determining the level of satisfaction, the answers were obtained on five levels of satisfaction: "excellent", "good", "medium", "bad" and "very bad". A total of 39 questionnaires were collected from students: 8 for course evaluation, 19 for teacher evaluation, and 12 for student evaluation. From the research, a database with 39 fields and 324 records was created. Results: In this database, cluster analysis was performed in MATLAB and R programs using the k-means method of data mining. Calculated the Hopkins statistic in the created database, the values are 0.88, 0.87, and 0.97. This shows that cluster analysis methods can be used. The course evaluation sub-fund is divided into three clusters. Among them, cluster I has 150 objects with a "good" rating of 46.2%, cluster II has 119 objects with a "medium" rating of 36.7%, and Cluster III has 54 objects with a "good" rating of 16.6%. The teacher evaluation sub-base into three clusters, there are 179 objects with a "good" rating of 55.2% in cluster II, 108 objects with an "average" rating of 33.3% in cluster III, and 36 objects with an "excellent" rating in cluster I of 11.1%. The sub-base of student evaluations is divided into two clusters: cluster II has 215 objects with an "excellent" rating of 66.3%, and cluster I has 108 objects with an "excellent" rating of 33.3%. Evaluating the resulting clusters with the Silhouette coefficient, 0.32 for the course evaluation cluster, 0.31 for the teacher evaluation cluster, and 0.30 for student evaluation show statistical significance. Conclusion: Finally, to conclude, cluster analysis in the model of the medical physics lesson “good” - 46.2%, “middle” - 36.7%, “bad” - 16.6%; 55.2% - “good”, 33.3% - “middle”, 11.1% - “bad” in the teacher evaluation model; 66.3% - “good” and 33.3% of “bad” in the student evaluation model.

Keywords: questionnaire, data mining, k-means method, silhouette coefficient

Procedia PDF Downloads 41

28120 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning

Authors: Rik van Leeuwen, Ger Koole

Abstract:

Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.

Keywords: hierarchical cluster analysis, hospitality, market segmentation

Procedia PDF Downloads 104

28119 Cluster Analysis of Customer Churn in Telecom Industry

Authors: Abbas Al-Refaie

Abstract:

The research examines the factors that affect customer churn (CC) in the Jordanian telecom industry. A total of 700 surveys were distributed. Cluster analysis revealed three main clusters. Results showed that CC and customer satisfaction (CS) were the key determinants in forming the three clusters. In two clusters, the center values of CC were high, indicating that the customers were loyal and SC was expensive and time- and energy-consuming. Still, the mobile service provider (MSP) should enhance its communication (COM), and value added services (VASs), as well as customer complaint management systems (CCMS). Finally, for the third cluster the center of the CC indicates a poor level of loyalty, which facilitates customers churn to another MSP. The results of this study provide valuable feedback for MSP decision makers regarding approaches to improving their performance and reducing CC.

Keywords: cluster analysis, telecom industry, switching cost, customer churn

Procedia PDF Downloads 322

28118 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir

Abstract:

We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 382

28117 Structure Clustering for Milestoning Applications of Complex Conformational Transitions

Authors: Amani Tahat, Serdal Kirmizialtin

Abstract:

Trajectory fragment methods such as Markov State Models (MSM), Milestoning (MS) and Transition Path sampling are the prime choice of extending the timescale of all atom Molecular Dynamics simulations. In these approaches, a set of structures that covers the accessible phase space has to be chosen a priori using cluster analysis. Structural clustering serves to partition the conformational state into natural subgroups based on their similarity, an essential statistical methodology that is used for analyzing numerous sets of empirical data produced by Molecular Dynamics (MD) simulations. Local transition kernel among these clusters later used to connect the metastable states using a Markovian kinetic model in MSM and a non-Markovian model in MS. The choice of clustering approach in constructing such kernel is crucial since the high dimensionality of the biomolecular structures might easily confuse the identification of clusters when using the traditional hierarchical clustering methodology. Of particular interest, in the case of MS where the milestones are very close to each other, accurate determination of the milestone identity of the trajectory becomes a challenging issue. Throughout this work we present two cluster analysis methods applied to the cis–trans isomerism of dinucleotide AA. The choice of nucleic acids to commonly used proteins to study the cluster analysis is two fold: i) the energy landscape is rugged; hence transitions are more complex, enabling a more realistic model to study conformational transitions, ii) Nucleic acids conformational space is high dimensional. A diverse set of internal coordinates is necessary to describe the metastable states in nucleic acids, posing a challenge in studying the conformational transitions. Herein, we need improved clustering methods that accurately identify the AA structure in its metastable states in a robust way for a wide range of confused data conditions. The single linkage approach of the hierarchical clustering available in GROMACS MD-package is the first clustering methodology applied to our data. Self Organizing Map (SOM) neural network, that also known as a Kohonen network, is the second data clustering methodology. The performance comparison of the neural network as well as hierarchical clustering method is studied by means of computing the mean first passage times for the cis-trans conformational rates. Our hope is that this study provides insight into the complexities and need in determining the appropriate clustering algorithm for kinetic analysis. Our results can improve the effectiveness of decisions based on clustering confused empirical data in studying conformational transitions in biomolecules.

Keywords: milestoning, self organizing map, single linkage, structure clustering

Procedia PDF Downloads 220

28116 Application of Combined Cluster and Discriminant Analysis to Make the Operation of Monitoring Networks More Economical

Authors: Norbert Magyar, Jozsef Kovacs, Peter Tanos, Balazs Trasy, Tamas Garamhegyi, Istvan Gabor Hatvani

Abstract:

Water is one of the most important common resources, and as a result of urbanization, agriculture, and industry it is becoming more and more exposed to potential pollutants. The prevention of the deterioration of water quality is a crucial role for environmental scientist. To achieve this aim, the operation of monitoring networks is necessary. In general, these networks have to meet many important requirements, such as representativeness and cost efficiency. However, existing monitoring networks often include sampling sites which are unnecessary. With the elimination of these sites the monitoring network can be optimized, and it can operate more economically. The aim of this study is to illustrate the applicability of the CCDA (Combined Cluster and Discriminant Analysis) to the field of water quality monitoring and optimize the monitoring networks of a river (the Danube), a wetland-lake system (Kis-Balaton & Lake Balaton), and two surface-subsurface water systems on the watershed of Lake Neusiedl/Lake Fertő and on the Szigetköz area over a period of approximately two decades. CCDA combines two multivariate data analysis methods: hierarchical cluster analysis and linear discriminant analysis. Its goal is to determine homogeneous groups of observations, in our case sampling sites, by comparing the goodness of preconceived classifications obtained from hierarchical cluster analysis with random classifications. The main idea behind CCDA is that if the ratio of correctly classified cases for a grouping is higher than at least 95% of the ratios for the random classifications, then at the level of significance (α=0.05) the given sampling sites don’t form a homogeneous group. Due to the fact that the sampling on the Lake Neusiedl/Lake Fertő was conducted at the same time at all sampling sites, it was possible to visualize the differences between the sampling sites belonging to the same or different groups on scatterplots. Based on the results, the monitoring network of the Danube yields redundant information over certain sections, so that of 12 sampling sites, 3 could be eliminated without loss of information. In the case of the wetland (Kis-Balaton) one pair of sampling sites out of 12, and in the case of Lake Balaton, 5 out of 10 could be discarded. For the groundwater system of the catchment area of Lake Neusiedl/Lake Fertő all 50 monitoring wells are necessary, there is no redundant information in the system. The number of the sampling sites on the Lake Neusiedl/Lake Fertő can decrease to approximately the half of the original number of the sites. Furthermore, neighbouring sampling sites were compared pairwise using CCDA and the results were plotted on diagrams or isoline maps showing the location of the greatest differences. These results can help researchers decide where to place new sampling sites. The application of CCDA proved to be a useful tool in the optimization of the monitoring networks regarding different types of water bodies. Based on the results obtained, the monitoring networks can be operated more economically.

Keywords: combined cluster and discriminant analysis, cost efficiency, monitoring network optimization, water quality

Procedia PDF Downloads 345

28115 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 407

28114 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 587

28113 Knowledge Discovery from Production Databases for Hierarchical Process Control

Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata

Abstract:

The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system, thus, the proposed solution has been verified. The paper documents how it is possible to apply new discovery knowledge to be used in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.

Keywords: hierarchical process control, knowledge discovery from databases, neural network, process control

Procedia PDF Downloads 475

28112 Evaluating the Factors Controlling the Hydrochemistry of Gaza Coastal Aquifer Using Hydrochemical and Multivariate Statistical Analysis

Authors: Madhat Abu Al-Naeem, Ismail Yusoff, Ng Tham Fatt, Yatimah Alias

Abstract:

Groundwater in Gaza strip is increasingly being exposed to anthropic and natural factors that seriously impacted the groundwater quality. Physiochemical data of groundwater can offer important information on changes in groundwater quality that can be useful in improving water management tactics. An integrative hydrochemical and statistical techniques (Hierarchical cluster analysis (HCA) and factor analysis (FA)) have been applied on the existence ten physiochemical data of 84 samples collected in (2000/2001) using STATA, AquaChem, and Surfer softwares to: 1) Provide valuable insight into the salinization sources and the hydrochemical processes controlling the chemistry of groundwater. 2) Differentiate the influence of natural processes and man-made activities. The recorded large diversity in water facies with dominance Na-Cl type that reveals a highly saline aquifer impacted by multiple complex hydrochemical processes. Based on WHO standards, only (15.5%) of the wells were suitable for drinking. HCA yielded three clusters. Cluster 1 is the highest in salinity, mainly due to the impact of Eocene saline water invasion mixed with human inputs. Cluster 2 is the lowest in salinity also due to Eocene saline water invasion but mixed with recent rainfall recharge and limited carbonate dissolution and nitrate pollution. Cluster 3 is similar in salinity to Cluster 2, but with a high diversity of facies due to the impact of many sources of salinity as sea water invasion, carbonate dissolution and human inputs. Factor analysis yielded two factors accounting for 88% of the total variance. Factor 1 (59%) is a salinization factor demonstrating the mixing contribution of natural saline water with human inputs. Factor 2 measure the hardness and pollution which explained 29% of the total variance. The negative relationship between the NO3- and pH may reveal a denitrification process in a heavy polluted aquifer recharged by a limited oxygenated rainfall. Multivariate statistical analysis combined with hydrochemical analysis indicate that the main factors controlling groundwater chemistry were Eocene saline invasion, seawater invasion, sewage invasion and rainfall recharge and the main hydrochemical processes were base ion and reverse ion exchange processes with clay minerals (water rock interactions), nitrification, carbonate dissolution and a limited denitrification process.

Keywords: dendrogram and cluster analysis, water facies, Eocene saline invasion and sea water invasion, nitrification and denitrification

Procedia PDF Downloads 358

28111 The Use of Ward Linkage in Cluster Integration with a Path Analysis Approach

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

Path analysis is an analytical technique to study the causal relationship between independent and dependent variables. In this study, the integration of Clusters in the Ward Linkage method was used in a variety of clusters with path analysis. The variables used are character (x₁), capacity (x₂), capital (x₃), collateral (x₄), and condition of economy (x₄) to on time pay (y₂) through the variable willingness to pay (y₁). The purpose of this study was to compare the Ward Linkage method cluster integration in various clusters with path analysis to classify willingness to pay (y₁). The data used are primary data from questionnaires filled out by customers of Bank X, using purposive sampling. The measurement method used is the average score method. The results showed that the Ward linkage method cluster integration with path analysis on 2 clusters is the best method, by comparing the coefficient of determination. Variable character (x₁), capacity (x₂), capital (x₃), collateral (x₄), and condition of economy (x₅) to on time pay (y₂) through willingness to pay (y₁) can be explained by 58.3%, while the remaining 41.7% is explained by variables outside the model.

Keywords: cluster integration, linkage, path analysis, compliant paying behavior

Procedia PDF Downloads 178

28110 Application of Multivariate Statistics and Hydro-Chemical Approach for Groundwater Quality Assessment: A Study on Birbhum District, West Bengal, India

Authors: N. C. Ghosh, Niladri Das, Prolay Mondal, Ranajit Ghosh

Abstract:

Groundwater quality deterioration due to human activities has become a prime factor of modern life. The major concern of the study is to access spatial variation of groundwater quality and to identify the sources of groundwater chemicals and its impact on human health of the concerned area. Multivariate statistical techniques, cluster, principal component analysis, and hydrochemical fancies are been applied to measure groundwater quality data on 14 parameters from 107 sites distributed randomly throughout the Birbhum district. Five factors have been extracted using Varimax rotation with Kaiser Normalization. The first factor explains 27.61% of the total variance where high positive loading have been concentrated in TH, Ca, Mg, Cl and F (Fluoride). In the studied region, due to the presence of basaltic Rajmahal trap fluoride contamination is highly concentrated and that has an adverse impact on human health such as fluorosis. The second factor explains 24.41% of the total variance which includes Na, HCO₃, EC, and SO₄. The last factor or the fifth factor explains 8.85% of the total variance, and it includes pH which maintains the acidic and alkaline character of the groundwater. Hierarchical cluster analysis (HCA) grouped the 107 sampling station into two clusters. One cluster having high pollution and another cluster having less pollution. Moreover hydromorphological facies viz. Wilcox diagram, Doneen’s chart, and USSL diagram reveal the quality of the groundwater like the suitability of the groundwater for irrigation or water used for drinking purpose like permeability index of the groundwater, quality assessment of groundwater for irrigation. Gibb’s diagram depicts that the major portion of the groundwater of this region is rock dominated origin, as the western part of the region characterized by the Jharkhand plateau fringe comprises basalt, gneiss, granite rocks.

Keywords: correlation, factor analysis, hydrological facies, hydrochemistry

Procedia PDF Downloads 210