Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 30646

Search results for: K -means cluster analysis

30646 Cluster Analysis of Students’ Learning Satisfaction

Authors: Purevdolgor Luvsantseren, Ajnai Luvsan-Ish, Oyuntsetseg Sandag, Javzmaa Tsend, Akhit Tileubai, Baasandorj Chilhaasuren, Jargalbat Puntsagdash, Galbadrakh Chuluunbaatar

Abstract:

One of the indicators of the quality of university services is student satisfaction. Aim: We aimed to study the level of satisfaction of students in the first year of premedical courses in the course of Medical Physics using the cluster method. Materials and Methods: In the framework of this goal, a questionnaire was collected from a total of 324 students who studied the medical physics course of the 1st course of the premedical course at the Mongolian National University of Medical Sciences. When determining the level of satisfaction, the answers were obtained on five levels of satisfaction: "excellent", "good", "medium", "bad" and "very bad". A total of 39 questionnaires were collected from students: 8 for course evaluation, 19 for teacher evaluation, and 12 for student evaluation. From the research, a database with 39 fields and 324 records was created. Results: In this database, cluster analysis was performed in MATLAB and R programs using the k-means method of data mining. Calculated the Hopkins statistic in the created database, the values are 0.88, 0.87, and 0.97. This shows that cluster analysis methods can be used. The course evaluation sub-fund is divided into three clusters. Among them, cluster I has 150 objects with a "good" rating of 46.2%, cluster II has 119 objects with a "medium" rating of 36.7%, and Cluster III has 54 objects with a "good" rating of 16.6%. The teacher evaluation sub-base into three clusters, there are 179 objects with a "good" rating of 55.2% in cluster II, 108 objects with an "average" rating of 33.3% in cluster III, and 36 objects with an "excellent" rating in cluster I of 11.1%. The sub-base of student evaluations is divided into two clusters: cluster II has 215 objects with an "excellent" rating of 66.3%, and cluster I has 108 objects with an "excellent" rating of 33.3%. Evaluating the resulting clusters with the Silhouette coefficient, 0.32 for the course evaluation cluster, 0.31 for the teacher evaluation cluster, and 0.30 for student evaluation show statistical significance. Conclusion: Finally, to conclude, cluster analysis in the model of the medical physics lesson “good” - 46.2%, “middle” - 36.7%, “bad” - 16.6%; 55.2% - “good”, 33.3% - “middle”, 11.1% - “bad” in the teacher evaluation model; 66.3% - “good” and 33.3% of “bad” in the student evaluation model.

Keywords: questionnaire, data mining, k-means method, silhouette coefficient

Procedia PDF Downloads 41

30645 Hybridized Approach for Distance Estimation Using K-Means Clustering

Authors: Ritu Vashistha, Jitender Kumar

Abstract:

Clustering using the K-means algorithm is a very common way to understand and analyze the obtained output data. When a similar object is grouped, this is called the basis of Clustering. There is K number of objects and C number of cluster in to single cluster in which k is always supposed to be less than C having each cluster to be its own centroid but the major problem is how is identify the cluster is correct based on the data. Formulation of the cluster is not a regular task for every tuple of row record or entity but it is done by an iterative process. Each and every record, tuple, entity is checked and examined and similarity dissimilarity is examined. So this iterative process seems to be very lengthy and unable to give optimal output for the cluster and time taken to find the cluster. To overcome the drawback challenge, we are proposing a formula to find the clusters at the run time, so this approach can give us optimal results. The proposed approach uses the Euclidian distance formula as well melanosis to find the minimum distance between slots as technically we called clusters and the same approach we have also applied to Ant Colony Optimization(ACO) algorithm, which results in the production of two and multi-dimensional matrix.

Keywords: ant colony optimization, data clustering, centroids, data mining, k-means

Procedia PDF Downloads 125

30644 Clustering Locations of Textile and Garment Industries to Compare with the Future Industrial Cluster in Thailand

Authors: Kanogkan Leerojanaprapa

Abstract:

Textile and garment industry is used to a major exporting industry of Thailand. According to lacking of the nation's price-competitiveness by stopping the EU's GSP (Generalised Scheme of Preferences) and ‘Nationwide Minimum Wage Policy’ that Thailand’s employers must pay all employees at least 300 baht (about $10) a day, the supply chains of the Thai textile and garment industry is affected and need to be reformed. Therefore, either Thai textile or garment industry will be existed or not would be concerned. This is also challenged for the government to decide which industries should be promoted the future industries of Thailand. Recently Thai government launch The Cluster-based Special Economic Development Zones Policy for promoting business cluster (effect on September 16, 2015). They define a cluster as the concentration of interconnected businesses and related institutions that operate within the same geographic areas and textiles and garment is one of target industrial clusters and 9 provinces are targeted (Bangkok, Kanchanaburi, Nakhon Pathom, Ratchaburi, Samut Sakhon, Chonburi, Chachoengsao, Prachinburi, and Sa Kaeo). The cluster zone are defined to link west-east corridor connected to manufacturing source in Cambodia and Mynmar to Bangkok where are promoted to be design, sourcing, and trading hub. The Thai government will provide tax and non-tax incentives for targeted industries within the clusters and expects these businesses are scattered to where they can get the most benefit which will identify future industrial cluster. This research will show the difference between the current cluster and future cluster following the target provinces of the textile and garment. The current cluster is analysed from secondary data. The four characteristics of the numbers of plants in Spinning, weaving and finishing of textiles, Manufacture of made-up textile articles, except apparel, Manufacture of knitted and crocheted fabrics, and Manufacture of other textiles, not elsewhere classified in particular 77 provinces (in total) are clustered by K-means cluster analysis and Hierarchical Cluster Analysis. In addition, the cluster can be confirmed and showed which variables contribute the most to defined cluster solution with ANOVA test. The results of analysis can identify 22 provinces (which the textile or garment plants are located) into 3 clusters. Plants in cluster 1 tend to be large numbers of plants which is only Bangkok, Next plants in cluster 2 tend to be moderate numbers of plants which are Samut Prakan, Samut Sakhon and Nakhon Pathom. Finally plants in cluster 3 tend to be little numbers of plants which are other 18 provinces. The same methodology can be implemented in other industries for future study.

Keywords: ANOVA, hierarchical cluster analysis, industrial clusters, K -means cluster analysis, textile and garment industry

Procedia PDF Downloads 210

30643 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 278

30642 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 145

30641 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm

Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang

Abstract:

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.

Keywords: degree, initial cluster center, k-means, minimum spanning tree

Procedia PDF Downloads 405

30640 Design and Optimization of Open Loop Supply Chain Distribution Network Using Hybrid K-Means Cluster Based Heuristic Algorithm

Authors: P. Suresh, K. Gunasekaran, R. Thanigaivelan

Abstract:

Radio frequency identification (RFID) technology has been attracting considerable attention with the expectation of improved supply chain visibility for consumer goods, apparel, and pharmaceutical manufacturers, as well as retailers and government procurement agencies. It is also expected to improve the consumer shopping experience by making it more likely that the products they want to purchase are available. Recent announcements from some key retailers have brought interest in RFID to the forefront. A modified K- Means Cluster based Heuristic approach, Hybrid Genetic Algorithm (GA) - Simulated Annealing (SA) approach, Hybrid K-Means Cluster based Heuristic-GA and Hybrid K-Means Cluster based Heuristic-GA-SA for Open Loop Supply Chain Network problem are proposed. The study incorporated uniform crossover operator and combined crossover operator in GAs for solving open loop supply chain distribution network problem. The algorithms are tested on 50 randomly generated data set and compared with each other. The results of the numerical experiments show that the Hybrid K-means cluster based heuristic-GA-SA, when tested on 50 randomly generated data set, shows superior performance to the other methods for solving the open loop supply chain distribution network problem.

Keywords: RFID, supply chain distribution network, open loop supply chain, genetic algorithm, simulated annealing

Procedia PDF Downloads 161

30639 Anomaly Detection Based Fuzzy K-Mode Clustering for Categorical Data

Authors: Murat Yazici

Abstract:

Anomalies are irregularities found in data that do not adhere to a well-defined standard of normal behavior. The identification of outliers or anomalies in data has been a subject of study within the statistics field since the 1800s. Over time, a variety of anomaly detection techniques have been developed in several research communities. The cluster analysis can be used to detect anomalies. It is the process of associating data with clusters that are as similar as possible while dissimilar clusters are associated with each other. Many of the traditional cluster algorithms have limitations in dealing with data sets containing categorical properties. To detect anomalies in categorical data, fuzzy clustering approach can be used with its advantages. The fuzzy k-Mode (FKM) clustering algorithm, which is one of the fuzzy clustering approaches, by extension to the k-means algorithm, is reported for clustering datasets with categorical values. It is a form of clustering: each point can be associated with more than one cluster. In this paper, anomaly detection is performed on two simulated data by using the FKM cluster algorithm. As a significance of the study, the FKM cluster algorithm allows to determine anomalies with their abnormality degree in contrast to numerous anomaly detection algorithms. According to the results, the FKM cluster algorithm illustrated good performance in the anomaly detection of data, including both one anomaly and more than one anomaly.

Keywords: fuzzy k-mode clustering, anomaly detection, noise, categorical data

Procedia PDF Downloads 48

30638 An Exploratory Study of Nasik Small and Medium Enterprises Cluster

Authors: Pragya Bhawsar, Utpal Chattopadhyay

Abstract:

Small and Medium Enterprises play crucial role in contributing to economic objectives of an emerging nation. To support SMEs, the idea of creation of clusters has been prevalent since past two decades. In this paper, an attempt has been done to explore the impact of being in the cluster on the competitiveness of SMEs. To meet the objective, Nasik Cluster (India) has been selected. The information was collected by means of two focus group discussions and survey of thirty SMEs. The finding generates interest revealing the fact that under the concept ‘Cluster’ a lot of ambiguity flourish. Besides the problems and opportunities of the firms in the cluster the results bring to notice that the benefits of clusterization can only reach to SMEs when the whole location can be considered/understood as a cluster, rather than many subsets (various forms of clusters) prevailing under it. Fostering such an understanding calls for harmony among the various stakeholders of the clusters. The dynamics of interaction among government, local industry associations, relevant institutions, large firms and finally SMEs which makes the most of the location based cluster, are significant in shaping the host cluster’s competitiveness and vice versa.

Keywords: SMEs, industry clusters, common facility centres, co-creation, policy

Procedia PDF Downloads 289

30637 An Investigative Study on the Use of Online Marketing Methods in Hungary

Authors: E. Happ, Zs. Ivancsone Horvath

Abstract:

With the development of the information technology, IT, sector, all industry of the world has a new path, dealing with digitalisation. Tourism is the most rapidly increasing industry in the world. Without digitalisation, tourism operators would not be competitive enough with foreign destinations or other experience-based service providers. Digitalisation is also necessary to enable organizations, which are interested in tourism to meet the growing expectations of consumers. With the help of digitalisation, tourism providers can also obtain information about tourists, changes in consumer behaviour, and the use of online services. The degree of digitalisation in tourism is different for different services. The research is based on a questionnaire survey conducted in 2018 in Hungary. The sample with more than 500 respondents was processed by the SPSS program, using a variety of analysis methods. The following two variables were observed from more aspects: frequency of travel and the importance of services related to online travel. With the help of these variables, a cluster analysis was performed among the participants. The sample can be divided into two groups using K-mean cluster analysis. Cluster ‘1’ is a positive group; they can be called the “most digital tourists.” They agree in most things, with low standard deviation, and for them, digitalisation is a starting point. To the members of Cluster ‘2’, digitalisation is important, too. The results show what is important (accommodation, information gathering) to them, but also what they are not interested in at all within the digital world (e.g., car rental or online sharing). Interestingly, there is no third negative cluster. This result (that there is no result) proves that tourism uses digitalisation, and the question is only the extent of the use of online tools and methods. With the help of the designed consumer groups, the characteristics of digital tourism segments can be identified. The help of different variables characterised these groups. One of them is the frequency of travel, where there is a significant correlation between travel frequency and cluster membership. The shift is clear towards Cluster ‘1’, which means, those who find services related to online travel more important, are more likely to travel as well. By learning more about digital tourists’ consumer behaviour, the results of this research can help the providers in what kind of marketing tools could be used to influence the consumer choices of the different consumer groups created using digital devices, furthermore how to conduct more detailed and effective marketing activities. The main finding of the research was that most of the people have digital tools which are important to be able to participate in e-tourism. Of these, mobile devices are increasingly preferred. That means the challenge for service providers is no longer the digital presence but having optimised application for different devices.

Keywords: cluster analysis, digital tourism, marketing tool, tourist behaviour

Procedia PDF Downloads 125

30636 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 429

30635 Cluster Analysis of Customer Churn in Telecom Industry

Authors: Abbas Al-Refaie

Abstract:

The research examines the factors that affect customer churn (CC) in the Jordanian telecom industry. A total of 700 surveys were distributed. Cluster analysis revealed three main clusters. Results showed that CC and customer satisfaction (CS) were the key determinants in forming the three clusters. In two clusters, the center values of CC were high, indicating that the customers were loyal and SC was expensive and time- and energy-consuming. Still, the mobile service provider (MSP) should enhance its communication (COM), and value added services (VASs), as well as customer complaint management systems (CCMS). Finally, for the third cluster the center of the CC indicates a poor level of loyalty, which facilitates customers churn to another MSP. The results of this study provide valuable feedback for MSP decision makers regarding approaches to improving their performance and reducing CC.

Keywords: cluster analysis, telecom industry, switching cost, customer churn

Procedia PDF Downloads 322

30634 Multi-Cluster Overlapping K-Means Extension Algorithm (MCOKE)

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper, we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold to be defined as a priority which can be difficult to determine by novice users.

Keywords: data mining, k-means, MCOKE, overlapping

Procedia PDF Downloads 566

30633 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission

Procedia PDF Downloads 409

30632 Support Vector Machine Based Retinal Therapeutic for Glaucoma Using Machine Learning Algorithm

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Yang Yung, Tracy Lin Huan

Abstract:

Glaucoma is a group of visual maladies represented by the scheduled optic nerve neuropathy; means to the increasing dwindling in vision ground, resulting in loss of sight. In this paper, a novel support vector machine based retinal therapeutic for glaucoma using machine learning algorithm is conservative. The algorithm has fitting pragmatism; subsequently sustained on correlation clustering mode, it visualizes perfect computations in the multi-dimensional space. Support vector clustering turns out to be comparable to the scale-space advance that investigates the cluster organization by means of a kernel density estimation of the likelihood distribution, where cluster midpoints are idiosyncratic by the neighborhood maxima of the concreteness. The predicted planning has 91% attainment rate on data set deterrent on a consolidation of 500 realistic images of resolute and glaucoma retina; therefore, the computational benefit of depending on the cluster overlapping system pedestal on machine learning algorithm has complete performance in glaucoma therapeutic.

Keywords: machine learning algorithm, correlation clustering mode, cluster overlapping system, glaucoma, kernel density estimation, retinal therapeutic

Procedia PDF Downloads 247

30631 The Use of Ward Linkage in Cluster Integration with a Path Analysis Approach

Authors: Adji Achmad Rinaldo Fernandes

Abstract:

Path analysis is an analytical technique to study the causal relationship between independent and dependent variables. In this study, the integration of Clusters in the Ward Linkage method was used in a variety of clusters with path analysis. The variables used are character (x₁), capacity (x₂), capital (x₃), collateral (x₄), and condition of economy (x₄) to on time pay (y₂) through the variable willingness to pay (y₁). The purpose of this study was to compare the Ward Linkage method cluster integration in various clusters with path analysis to classify willingness to pay (y₁). The data used are primary data from questionnaires filled out by customers of Bank X, using purposive sampling. The measurement method used is the average score method. The results showed that the Ward linkage method cluster integration with path analysis on 2 clusters is the best method, by comparing the coefficient of determination. Variable character (x₁), capacity (x₂), capital (x₃), collateral (x₄), and condition of economy (x₅) to on time pay (y₂) through willingness to pay (y₁) can be explained by 58.3%, while the remaining 41.7% is explained by variables outside the model.

Keywords: cluster integration, linkage, path analysis, compliant paying behavior

Procedia PDF Downloads 178

30630 Unlocking E-commerce: Analyzing User Behavior and Segmenting Customers for Strategic Insights

Authors: Aditya Patil, Arun Patil, Vaishali Patil, Sudhir Chitnis, Anjum Patel

Abstract:

Rapid growth has given e-commerce platforms a lot of client behavior and spending data. To maximize their strategy, businesses must understand how customers utilize online shopping platforms and what influences their purchases. Our research focuses on e-commerce user behavior and purchasing trends. This extensive study examines spending and user behavior. Regression and grouping disclose relevant data from the dataset. We can understand user spending trends via multilevel regression. We can analyze how pricing, user demographics, and product categories affect customer purchase decisions with this technique. Clustering groups consumers by spending. Important information was found. Purchase habits vary by user group. Our analysis illuminates the complex world of e-commerce consumer behavior and purchase trends. Understanding user behavior helps create effective e-commerce marketing strategies. This market can benefit from K-means clustering. This study focuses on tailoring strategies to user groups and improving product and price effectiveness. Customer buying behaviors across categories were shown via K-means clusters. Average spending is highest in Cluster 4 and lowest in Cluster 3. Clothing is less popular than gadgets and appliances around the holidays. Cluster spending distribution is examined using average variables. Our research enhances e-commerce analytics. Companies can improve customer service and decision-making with this data.

Keywords: e-commerce, regression, clustering, k-means

Procedia PDF Downloads 9

30629 A Comparative and Critical Analysis of Some Routing Protocols in Wireless Sensor Networks

Authors: Ishtiaq Wahid, Masood Ahmad, Nighat Ayub, Sajad Ali

Abstract:

Lifetime of a wireless sensor network (WSN) is directly proportional to the energy consumption of its constituent nodes. Routing in wireless sensor network is very challenging due its inherit characteristics. In hierarchal routing the sensor ﬁled is divided into clusters. The cluster-heads are selected from each cluster, which forms a hierarchy of nodes. The cluster-heads are used to transmit the data to the base station while other nodes perform the sensing task. In this way the lifetime of the network is increased. In this paper a comparative study of hierarchal routing protocols are conducted. The simulation is done in NS-2 for validation.

Keywords: WSN, cluster, routing, sensor networks

Procedia PDF Downloads 472

30628 Spatio-Temporal Changes of Rainfall in São Paulo, Brazil (1973-2012): A Gamma Distribution and Cluster Analysis

Authors: Guilherme Henrique Gabriel, Lucí Hidalgo Nunes

Abstract:

An important feature of rainfall regimes is the variability, which is subject to the atmosphere’s general and regional dynamics, geographical position and relief. Despite being inherent to the climate system, it can harshly impact virtually all human activities. In turn, global climate change has the ability to significantly affect smaller-scale rainfall regimes by altering their current variability patterns. In this regard, it is useful to know if regional climates are changing over time and whether it is possible to link these variations to climate change trends observed globally. This study is part of an international project (Metropole-FAPESP, Proc. 2012/51876-0 and Proc. 2015/11035-5) and the objective was to identify and evaluate possible changes in rainfall behavior in the state of São Paulo, southeastern Brazil, using rainfall data from 79 rain gauges for the last forty years. Cluster analysis and gamma distribution parameters were used for evaluating spatial and temporal trends, and the outcomes are presented by means of geographic information systems tools. Results show remarkable changes in rainfall distribution patterns in São Paulo over the years: changes in shape and scale parameters of gamma distribution indicate both an increase in the irregularity of rainfall distribution and the probability of occurrence of extreme events. Additionally, the spatial outcome of cluster analysis along with the gamma distribution parameters suggest that changes occurred simultaneously over the whole area, indicating that they could be related to remote causes beyond the local and regional ones, especially in a current global climate change scenario.

Keywords: climate change, cluster analysis, gamma distribution, rainfall

Procedia PDF Downloads 313

30627 Analysis of Entrepreneurship in Industrial Cluster

Authors: Wen-Hsiang Lai

Abstract:

Except for the internal aspects of entrepreneurship (i.e. motivation, opportunity perspective and alertness), there are external aspects that affecting entrepreneurship (i.e. the industrial cluster). By comparing the machinery companies located inside and outside the industrial district, this study aims to explore the cluster effects on the entrepreneurship of companies in Taiwan machinery clusters (TMC). In this study, three factors affecting the entrepreneurship in TMC are conducted as “competition”, “embedded-ness” and “specialized knowledge”. The “competition” in the industrial cluster is defined as the competitive advantages that companies gain in form of demand effects and diversified strategies; the “embedded-ness” refers to the quality of company relations (relational embedded-ness) and ranges (structural embedded-ness) with the industry components (universities, customers and complementary) that affecting knowledge transfer and knowledge generations; the “specialized knowledge” shares the internal knowledge within industrial clusters. This study finds that when comparing to the companies which are outside the cluster, the industrial cluster has positive influence on the entrepreneurship. Additionally, the factor of “relational embedded-ness” has significant impact on the entrepreneurship and affects the adaptation ability of companies in TMC. Finally, the factor of “competition” reveals partial influence on the entrepreneurship.

Keywords: entrepreneurship, industrial cluster, industrial district, economies of agglomerations, Taiwan Machinery Cluster (TMC)

Procedia PDF Downloads 383

30626 Design of a Graphical User Interface for Data Preprocessing and Image Segmentation Process in 2D MRI Images

Authors: Enver Kucukkulahli, Pakize Erdogmus, Kemal Polat

Abstract:

The 2D image segmentation is a significant process in finding a suitable region in medical images such as MRI, PET, CT etc. In this study, we have focused on 2D MRI images for image segmentation process. We have designed a GUI (graphical user interface) written in MATLABTM for 2D MRI images. In this program, there are two different interfaces including data pre-processing and image clustering or segmentation. In the data pre-processing section, there are median filter, average filter, unsharp mask filter, Wiener filter, and custom filter (a filter that is designed by user in MATLAB). As for the image clustering, there are seven different image segmentations for 2D MR images. These image segmentation algorithms are as follows: PSO (particle swarm optimization), GA (genetic algorithm), Lloyds algorithm, k-means, the combination of Lloyds and k-means, mean shift clustering, and finally BBO (Biogeography Based Optimization). To find the suitable cluster number in 2D MRI, we have designed the histogram based cluster estimation method and then applied to these numbers to image segmentation algorithms to cluster an image automatically. Also, we have selected the best hybrid method for each 2D MR images thanks to this GUI software.

Keywords: image segmentation, clustering, GUI, 2D MRI

Procedia PDF Downloads 371

30625 Scientific Linux Cluster for BIG-DATA Analysis (SLBD): A Case of Fayoum University

Authors: Hassan S. Hussein, Rania A. Abul Seoud, Amr M. Refaat

Abstract:

Scientific researchers face in the analysis of very large data sets that is increasing noticeable rate in today’s and tomorrow’s technologies. Hadoop and Spark are types of software that developed frameworks. Hadoop framework is suitable for many Different hardware platforms. In this research, a scientific Linux cluster for Big Data analysis (SLBD) is presented. SLBD runs open source software with large computational capacity and high performance cluster infrastructure. SLBD composed of one cluster contains identical, commodity-grade computers interconnected via a small LAN. SLBD consists of a fast switch and Gigabit-Ethernet card which connect four (nodes). Cloudera Manager is used to configure and manage an Apache Hadoop stack. Hadoop is a framework allows storing and processing big data across the cluster by using MapReduce algorithm. MapReduce algorithm divides the task into smaller tasks which to be assigned to the network nodes. Algorithm then collects the results and form the final result dataset. SLBD clustering system allows fast and efficient processing of large amount of data resulting from different applications. SLBD also provides high performance, high throughput, high availability, expandability and cluster scalability.

Keywords: big data platforms, cloudera manager, Hadoop, MapReduce

Procedia PDF Downloads 354

30624 An AI-Based Dynamical Resource Allocation Calculation Algorithm for Unmanned Aerial Vehicle

Authors: Zhou Luchen, Wu Yubing, Burra Venkata Durga Kumar

Abstract:

As the scale of the network becomes larger and more complex than before, the density of user devices is also increasing. The development of Unmanned Aerial Vehicle (UAV) networks is able to collect and transform data in an efficient way by using software-defined networks (SDN) technology. This paper proposed a three-layer distributed and dynamic cluster architecture to manage UAVs by using an AI-based resource allocation calculation algorithm to address the overloading network problem. Through separating services of each UAV, the UAV hierarchical cluster system performs the main function of reducing the network load and transferring user requests, with three sub-tasks including data collection, communication channel organization, and data relaying. In this cluster, a head node and a vice head node UAV are selected considering the Central Processing Unit (CPU), operational (RAM), and permanent (ROM) memory of devices, battery charge, and capacity. The vice head node acts as a backup that stores all the data in the head node. The k-means clustering algorithm is used in order to detect high load regions and form the UAV layered clusters. The whole process of detecting high load areas, forming and selecting UAV clusters, and moving the selected UAV cluster to that area is proposed as offloading traffic algorithm.

Keywords: k-means, resource allocation, SDN, UAV network, unmanned aerial vehicles

Procedia PDF Downloads 105

30623 Evaluation of Yield and Yield Components of Malaysian Palm Oil Board-Senegal Oil Palm Germplasm Using Multivariate Tools

Authors: Khin Aye Myint, Mohd Rafii Yusop, Mohd Yusoff Abd Samad, Shairul Izan Ramlee, Mohd Din Amiruddin, Zulkifli Yaakub

Abstract:

The narrow base of genetic is the main obstacle of breeding and genetic improvement in oil palm industry. In order to broaden the genetic bases, the Malaysian Palm Oil Board has been extensively collected wild germplasm from its original area of 11 African countries which are Nigeria, Senegal, Gambia, Guinea, Sierra Leone, Ghana, Cameroon, Zaire, Angola, Madagascar, and Tanzania. The germplasm collections were established and maintained as a field gene bank in Malaysian Palm Oil Board (MPOB) Research Station in Kluang, Johor, Malaysia to conserve a wide range of oil palm genetic resources for genetic improvement of Malaysian oil palm industry. Therefore, assessing the performance and genetic diversity of the wild materials is very important for understanding the genetic structure of natural oil palm population and to explore genetic resources. Principal component analysis (PCA) and Cluster analysis are very efficient multivariate tools in the evaluation of genetic variation of germplasm and have been applied in many crops. In this study, eight populations of MPOB-Senegal oil palm germplasm were studied to explore the genetic variation pattern using PCA and cluster analysis. A total of 20 yield and yield component traits were used to analyze PCA and Ward’s clustering using SAS 9.4 version software. The first four principal components which have eigenvalue >1 accounted for 93% of total variation with the value of 44%, 19%, 18% and 12% respectively for each principal component. PC1 showed highest positive correlation with fresh fruit bunch (0.315), bunch number (0.321), oil yield (0.317), kernel yield (0.326), total economic product (0.324), and total oil (0.324) while PC 2 has the largest positive association with oil to wet mesocarp (0.397) and oil to fruit (0.458). The oil palm population were grouped into four distinct clusters based on 20 evaluated traits, this imply that high genetic variation existed in among the germplasm. Cluster 1 contains two populations which are SEN 12 and SEN 10, while cluster 2 has only one population of SEN 3. Cluster 3 consists of three populations which are SEN 4, SEN 6, and SEN 7 while SEN 2 and SEN 5 were grouped in cluster 4. Cluster 4 showed the highest mean value of fresh fruit bunch, bunch number, oil yield, kernel yield, total economic product, and total oil and Cluster 1 was characterized by high oil to wet mesocarp, and oil to fruit. The desired traits that have the largest positive correlation on extracted PCs could be utilized for the improvement of oil palm breeding program. The populations from different clusters with the highest cluster means could be used for hybridization. The information from this study can be utilized for effective conservation and selection of the MPOB-Senegal oil palm germplasm for the future breeding program.

Keywords: cluster analysis, genetic variability, germplasm, oil palm, principal component analysis

Procedia PDF Downloads 159

30622 A Relative Entropy Regularization Approach for Fuzzy C-Means Clustering Problem

Authors: Ouafa Amira, Jiangshe Zhang

Abstract:

Clustering is an unsupervised machine learning technique; its aim is to extract the data structures, in which similar data objects are grouped in the same cluster, whereas dissimilar objects are grouped in different clusters. Clustering methods are widely utilized in different fields, such as: image processing, computer vision , and pattern recognition, etc. Fuzzy c-means clustering (fcm) is one of the most well known fuzzy clustering methods. It is based on solving an optimization problem, in which a minimization of a given cost function has been studied. This minimization aims to decrease the dissimilarity inside clusters, where the dissimilarity here is measured by the distances between data objects and cluster centers. The degree of belonging of a data point in a cluster is measured by a membership function which is included in the interval [0, 1]. In fcm clustering, the membership degree is constrained with the condition that the sum of a data object’s memberships in all clusters must be equal to one. This constraint can cause several problems, specially when our data objects are included in a noisy space. Regularization approach took a part in fuzzy c-means clustering technique. This process introduces an additional information in order to solve an ill-posed optimization problem. In this study, we focus on regularization by relative entropy approach, where in our optimization problem we aim to minimize the dissimilarity inside clusters. Finding an appropriate membership degree to each data object is our objective, because an appropriate membership degree leads to an accurate clustering result. Our clustering results in synthetic data sets, gaussian based data sets, and real world data sets show that our proposed model achieves a good accuracy.

Keywords: clustering, fuzzy c-means, regularization, relative entropy

Procedia PDF Downloads 257

30621 Analysis Customer Loyalty Characteristic and Segmentation Analysis in Mobile Phone Category in Indonesia

Authors: A. B. Robert, Adam Pramadia, Calvin Andika

Abstract:

The main purpose of this study is to explore consumer loyalty characteristic of mobile phone category in Indonesia. Second, this research attempts to identify consumer segment and to explore their profile in each segment as the basis of marketing strategy formulation. This study used some tools of multivariate analysis such as discriminant analysis and cluster analysis. Discriminate analysis used to discriminate consumer loyal and not loyal by using particular variables. Cluster analysis used to reveal various segment in mobile phone category. In addition to having better customer understanding in each segment, this study used descriptive analysis and cross tab analysis in each segment defined by cluster analysis. This study expected several findings. First, consumer can be divided into two large group of loyal versus not loyal by set of variables. Second, this study identifies customer segment in mobile phone category. Third, exploring customer profile in each segment that has been identified. This study answer a call for additional empirical research into different product categories. Therefore, a replication research is advisable. By knowing the customer loyalty characteristic, and deep analysis of their consumption behavior and profile for each segment, this study is very advisable for high impact marketing strategy development. This study contributes body of knowledge by adding empirical study of consumer loyalty, segmentation analysis in mobile phone category by multiple brand analysis.

Keywords: customer loyalty, segmentation, marketing strategy, discriminant analysis, cluster analysis, mobile phone

Procedia PDF Downloads 593

30620 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia

Authors: Carol Anne Hargreaves

Abstract:

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Keywords: machine learning, stock market trading, logistic regression, cluster analysis, factor analysis, decision trees, neural networks, automated stock investment system

Procedia PDF Downloads 151

30619 Evaluation of Groundwater Quality and Its Suitability for Drinking and Agricultural Purposes Using Self-Organizing Maps

Authors: L. Belkhiri, L. Mouni, A. Tiri, T.S. Narany

Abstract:

In the present study, the self-organizing map (SOM) clustering technique was applied to identify homogeneous clusters of hydrochemical parameters in El Milia plain, Algeria, to assess the quality of groundwater for potable and agricultural purposes. The visualization of SOM-analysis indicated that 35 groundwater samples collected in the study area were classified into three clusters, which showed progressive increase in electrical conductivity from cluster one to cluster three. Samples belonging to cluster one are mostly located in the recharge zone showing hard fresh water type, however, water type gradually changed to hard-brackish type in the discharge zone, including clusters two and three. Ionic ratio studies indicated the role of carbonate rock dissolution in increases on groundwater hardness, especially in cluster one. However, evaporation and evapotranspiration are the main processes increasing salinity in cluster two and three.

Keywords: groundwater quality, self-organizing maps, drinking water, irrigation water

Procedia PDF Downloads 251

30618 Factors Influencing Family Resilience and Quality of Life in Pediatric Cancer Patients and Their Caregivers: A Cluster Analysis

Authors: Li Wang, Dan Shu, Shiguang Pang, Lixiu Wang, Bing Xiang Yang, Qian Liu

Abstract:

Background: Cancer is one of the most severe diseases in childhood; long-term treatment and its side effects significantly impact the patient's physical, psychological, social functioning and quality of life while also placing substantial physical and psychological burdens on caregivers and families. Family resilience is crucial for children with cancer, helping them cope better with the disease and supporting the family in facing challenges together. As a family-level variable, family resilience requires information from multiple family members. However, to our best knowledge, there is currently no research investigating family resilience from both the perspectives of pediatric cancer patients and their caregivers. Therefore, this study aims to investigate the family resilience and quality of life of pediatric cancer patients from a patient–caregiver dyadic perspective. Methods: A total of 149 dyads of patients diagnosed with pediatric cancer patients and their principal caregivers were recruited from oncology departments of 4 tertiary hospitals in Wuhan and Taiyuan, China. All participants completed questionnaires that identified their demographic and clinical characteristics as well as assessed their family resilience and quality of life for both the patients and their caregivers. K-means cluster analysis was used to identify different clusters of family resilience based on the reports from patients and caregivers. Multivariate logistic regression and linear regression are used to analyze the factors influencing family resilience and quality of life, as well as the relationship between the two. Results: Three clusters of family resilience were identified: a cluster of high family resilience (HR), a cluster of low family resilience (LR), and a cluster of discrepant family resilience (DR). Most (67.1%) families fell into the cluster with low resilience. Characteristics such as the types of caregivers perceived social support of the patient were different among the three clusters. Compared to the LR group, families where the mother is the caregiver and where the patient has high social support are more likely to be assigned to the HR. The quality of life for caregivers was consistently highest in the HR cluster and lowest in the LR cluster. The patient's quality of life is not related to family resilience. In the linear regression analysis of the patient's quality of life, patients who are the first-born have higher quality of life, while those living with their parents have lower quality of life. The participants' characteristics were not associated with the quality of life for caregivers. Conclusions: In most families, family resilience was low. Families with maternal caregivers and patients receiving high levels of social support are more inclined to be higher levels of family resilience. Family resilience was linked to the quality of life of caregivers of pediatric cancer patients. The clinical implications of this findings suggest that healthcare and social support organizations should prioritize and support the participation of mothers in caregiving responsibilities. Furthermore, they should assist families in accessing social support to enhance family resilience. This study also emphasizes the importance of promoting family resilience for enhancing family health and happiness, as well as improving the quality of life for caregivers.

Keywords: pediatric cancer, cluster analysis, family resilience, quality of life

Procedia PDF Downloads 28

30617 Liver Lesion Extraction with Fuzzy Thresholding in Contrast Enhanced Ultrasound Images

Authors: Abder-Rahman Ali, Adélaïde Albouy-Kissi, Manuel Grand-Brochier, Viviane Ladan-Marcus, Christine Hoeffl, Claude Marcus, Antoine Vacavant, Jean-Yves Boire

Abstract:

In this paper, we present a new segmentation approach for focal liver lesions in contrast enhanced ultrasound imaging. This approach, based on a two-cluster Fuzzy C-Means methodology, considers type-II fuzzy sets to handle uncertainty due to the image modality (presence of speckle noise, low contrast, etc.), and to calculate the optimum inter-cluster threshold. Fine boundaries are detected by a local recursive merging of ambiguous pixels. The method has been tested on a representative database. Compared to both Otsu and type-I Fuzzy C-Means techniques, the proposed method significantly reduces the segmentation errors.

Keywords: defuzzification, fuzzy clustering, image segmentation, type-II fuzzy sets

Procedia PDF Downloads 478

‹
1
2
3
4
5
6
7
8
9
10
...
1021
1022
›