Search results for: Cluster Interpretation
606 A Text Clustering System based on k-means Type Subspace Clustering and Ontology
Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang
Abstract:
This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2461605 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1717604 Achieving High Availability by Implementing Beowulf Cluster
Authors: A.F.A. Abidin, N.S.M. Usop
Abstract:
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. This paper proposed the way to implement the Beowulf Cluster in order to achieve high performance as well as high availability.Keywords: Beowulf Cluster, grid computing, GridMPI, MPICH.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676603 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1908602 Fast and Accuracy Control Chart Pattern Recognition using a New cluster-k-Nearest Neighbor
Authors: Samir Brahim Belhaouari
Abstract:
By taking advantage of both k-NN which is highly accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.Keywords: Pattern recognition, Time series, k-Nearest Neighbor, k-means cluster, Gaussian Mixture Model, Classification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1964601 LINUX Cluster Possibilities in 3-D PHOTO Quality Imaging and Animation
Authors: Arjun Jain, Himanshu Agrawal, Nalini Vasudevan
Abstract:
In this paper we present the PC cluster built at R.V. College of Engineering (with great help from the Department of Computer Science and Electrical Engineering). The structure of the cluster is described and the performance is evaluated by rendering of complex 3D Persistence of Vision (POV) images by the Ray-Tracing algorithm. Here, we propose an unexampled method to render such images, distributedly on a low cost scalable.Keywords: PC cluster, parallel computations, ray tracing, persistence of vision, rendering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551600 Analysis of Diverse Cluster Ensemble Techniques
Authors: S. Sarumathi, N. Shanthi, P. Ranjetha
Abstract:
Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1841599 Solving Facility Location Problem on Cluster Computing
Authors: Ei Phyo Wai, Nay Min Tun
Abstract:
Computation of facility location problem for every location in the country is not easy simultaneously. Solving the problem is described by using cluster computing. A technique is to design parallel algorithm by using local search with single swap method in order to solve that problem on clusters. Parallel implementation is done by the use of portable parallel programming, Message Passing Interface (MPI), on Microsoft Windows Compute Cluster. In this paper, it presents the algorithm that used local search with single swap method and implementation of the system of a facility to be opened by using MPI on cluster. If large datasets are considered, the process of calculating a reasonable cost for a facility becomes time consuming. The result shows parallel computation of facility location problem on cluster speedups and scales well as problem size increases.Keywords: cluster, cost, demand, facility location
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1485598 Enabling Automated Deployment for Cluster Computing in Distributed PC Classrooms
Authors: Shuen-Tai Wang, Ying-Chuan Chen, Hsi-Ya Chang
Abstract:
The rapid improvement of the microprocessor and network has made it possible for the PC cluster to compete with conventional supercomputers. Lots of high throughput type of applications can be satisfied by using the current desktop PCs, especially for those in PC classrooms, and leave the supercomputers for the demands from large scale high performance parallel computations. This paper presents our development on enabling an automated deployment mechanism for cluster computing to utilize the computing power of PCs such as reside in PC classroom. After well deployment, these PCs can be transformed into a pre-configured cluster computing resource immediately without touching the existing education/training environment installed on these PCs. Thus, the training activities will not be affected by this additional activity to harvest idle computing cycles. The time and manpower required to build and manage a computing platform in geographically distributed PC classrooms also can be reduced by this development.
Keywords: PC cluster, automated deployment, cluster computing, PC classroom.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1530597 Implementation of Watch Dog Timer for Fault Tolerant Computing on Cluster Server
Authors: Meenakshi Bheevgade, Rajendra M. Patrikar
Abstract:
In today-s new technology era, cluster has become a necessity for the modern computing and data applications since many applications take more time (even days or months) for computation. Although after parallelization, computation speeds up, still time required for much application can be more. Thus, reliability of the cluster becomes very important issue and implementation of fault tolerant mechanism becomes essential. The difficulty in designing a fault tolerant cluster system increases with the difficulties of various failures. The most imperative obsession is that the algorithm, which avoids a simple failure in a system, must tolerate the more severe failures. In this paper, we implemented the theory of watchdog timer in a parallel environment, to take care of failures. Implementation of simple algorithm in our project helps us to take care of different types of failures; consequently, we found that the reliability of this cluster improves.Keywords: Cluster, Fault tolerant, Grid, Grid ComputingSystem, Meta-computing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2213596 Optimization of Fuzzy Cluster Nodes in Cellular Multimedia Networks
Authors: J. D. Mallapur, Supriya H., Santosh B. K., Tej H.
Abstract:
The cellular network is one of the emerging areas of communication, in which the mobile nodes act as member for one base station. The cluster based communication is now an emerging area of wireless cellular multimedia networks. The cluster renders fast communication and also a convenient way to work with connectivity. In our scheme we have proposed an optimization technique for the fuzzy cluster nodes, by categorizing the group members into three categories like long refreshable member, medium refreshable member and short refreshable member. By considering long refreshable nodes as static nodes, we compute the new membership values for the other nodes in the cluster. We compare their previous and present membership value with the threshold value to categorize them into three different members. By which, we optimize the nodes in the fuzzy clusters. The simulation results show that there is reduction in the cluster computational time and iterational time after optimization.Keywords: Clusters, fuzzy and optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1570595 Scalable Deployment and Configuration of High-Performance Virtual Clusters
Authors: Kyrre M Begnum, Matthew Disney
Abstract:
Virtualization and high performance computing have been discussed from a performance perspective in recent publications. We present and discuss a flexible and efficient approach to the management of virtual clusters. A virtual machine management tool is extended to function as a fabric for cluster deployment and management. We show how features such as saving the state of a running cluster can be used to avoid disruption. We also compare our approach to the traditional methods of cluster deployment and present benchmarks which illustrate the efficiency of our approach.
Keywords: Cluster management, clusters, high-performance, virtual machines, Xen
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403594 A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 820593 Location Based Clustering in Wireless Sensor Networks
Authors: Ashok Kumar, Narottam Chand, Vinod Kumar
Abstract:
Due to the limited energy resources, energy efficient operation of sensor node is a key issue in wireless sensor networks. Clustering is an effective method to prolong the lifetime of energy constrained wireless sensor network. However, clustering in wireless sensor network faces several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we propose a protocol supporting an energy efficient clustering, cluster head selection/rotation and data routing method to prolong the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection/rotation and data routing.
Keywords: Wireless sensor networks, clustering, energy efficient, localization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2685592 Dominating Set Algorithm and Trust Evaluation Scheme for Secured Cluster Formation and Data Transferring
Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji
Abstract:
This paper describes the proficient way of choosing the cluster head based on dominating set algorithm in a wireless sensor network (WSN). The algorithm overcomes the energy deterioration problems by this selection process of cluster heads. Clustering algorithms such as LEACH, EEHC and HEED enhance scalability in WSNs. Dominating set algorithm keeps the first node alive longer than the other protocols previously used. As the dominating set of cluster heads are directly connected to each node, the energy of the network is saved by eliminating the intermediate nodes in WSN. Security and trust is pivotal in network messaging. Cluster head is secured with a unique key. The member can only connect with the cluster head if and only if they are secured too. The secured trust model provides security for data transmission in the dominated set network with the group key. The concept can be extended to add a mobile sink for each or for no of clusters to transmit data or messages between cluster heads and to base station. Data security id preferably high and data loss can be prevented. The simulation demonstrates the concept of choosing cluster heads by dominating set algorithm and trust evaluation using DSTE. The research done is rationalized.
Keywords: Wireless Sensor Networks, LEECH, EEHC, HEED, DSTE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1405591 The Effects on Yield and Yield Components of Different Level Cluster Tip Reduction and Foliar Boric Acid Applications on Alphonse Lavallee Grape Cultivar
Abstract:
This study was carried out to determine the effects of Control (C), 1/3 Cluster Tip Reduction (1/3 CTR), 1/6 Cluster Tip Reduction (1/6 CTR), 1/9 Cluster Tip Reduction (1/9 CTR), 1/3 CTR + Boric Acid (BA), 1/6 CTR + BA, 1/9 CTR + BA applications on yield and yield components of four years old Alphonse Lavallee grape variety (Vitis vinifera L.) grown on grafted 110 Paulsen rootstock in Konya province in Turkey in the vegetation period in 2015. According to the results, the highest maturity index 21.46 with 1/9 CTR application; the highest grape juice yields 736.67 ml with 1/3 CTR + BA application; the highest L* color value 32.07 with 1/9 CTR application; the highest a* color value 1.74 with 1/9 CTR application; the highest b* color value 3.72 with 1/9 CTR application were obtained. The effects of applications on grape fresh yield, cluster weight and berry weight were not found statistically significant.
Keywords: Alphonse Lavallee grape cultivar, different cluster tip reduction (1/3, 1/6, 1/9), foliar boric acid application, yield, quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1850590 Assessment of Energy Consumption in Cluster Redevelopment: A Case Study of Bhendi Bazar in Mumbai
Authors: Insiya Kapasi, Roshni Udyavar Yehuda
Abstract:
Cluster Redevelopment is a new concept in the city of Mumbai. Its regulations were laid down by the government in 2009. The concept of cluster redevelopment encompasses a group of buildings defined by a boundary as specified by the municipal authority (in this case, Mumbai), which may be dilapidated or approved for redevelopment. The study analyses the effect of cluster redevelopment in the form of renewal of old group of buildings as compared to refurbishment or restoration - on energy consumption. The methodology includes methods of assessment to determine increase or decrease in energy consumption in cluster redevelopment based on different criteria such as carpet area of the units, building envelope and its architectural elements. Results show that as the area and number of units increase the Energy consumption increases and the EPI (energy performance index) decreases as compared to the base case. The energy consumption per unit area declines by 29% in the proposed cluster redevelopment as compared to the original settlement. It is recommended that although the development is spacious and provides more light and ventilation, aspects such as glass type, traditional architectural features and consumer behavior are critical in the reduction of energy consumption.
Keywords: Cluster redevelopment, energy consumption, energy efficiency, typologies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 631589 A New Method in Detection of Ceramic Tiles Color Defects Using Genetic C-Means Algorithm
Authors: Mahkameh S. Mostafavi
Abstract:
In this paper an algorithm is used to detect the color defects of ceramic tiles. First the image of a normal tile is clustered using GCMA; Genetic C-means Clustering Algorithm; those results in best cluster centers. C-means is a common clustering algorithm which optimizes an objective function, based on a measure between data points and the cluster centers in the data space. Here the objective function describes the mean square error. After finding the best centers, each pixel of the image is assigned to the cluster with closest cluster center. Then, the maximum errors of clusters are computed. For each cluster, max error is the maximum distance between its center and all the pixels which belong to it. After computing errors all the pixels of defected tile image are clustered based on the centers obtained from normal tile image in previous stage. Pixels which their distance from their cluster center is more than the maximum error of that cluster are considered as defected pixels.
Keywords: C-Means algorithm, color spaces, Genetic Algorithm, image clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649588 Some Issues with Extension of an HPC Cluster
Authors: Pil Seong Park
Abstract:
Homemade HPC clusters are widely used in many small labs, because they are easy to build and cost-effective. Even though incremental growth is an advantage of clusters, it results in heterogeneous systems anyhow. Instead of adding new nodes to the cluster, we can extend clusters to include some other Internet servers working independently on the same LAN, so that we can make use of their idle times, especially during the night. However extension across a firewall raises some security problems with NFS. In this paper, we propose a method to solve such a problem using SSH tunneling, and suggest a modified structure of the cluster that implements it.
Keywords: Extension of HPC clusters, Security, NFS, SSH tunneling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1898587 Analysis of Permanence and Extinction of Enterprise Cluster Based On Ecology Theory
Authors: Ping Liu, Yongkun Li
Abstract:
This paper is concerned with the permanence and extinction problem of enterprises cluster constituted by m satellite enterprises and a dominant enterprise. We present the model involving impulsive effect based on ecology theory, which effectively describe the competition and cooperation of enterprises cluster in real economic environment. Applying comparison theorem of impulsive differential equation, we establish sufficient conditions which ultimately affect the fate of enterprises: permanence, extinction, and co-existence. Finally, we present numerical examples to explain the economical significance of mathematical results.
Keywords: Enterprise cluster, permanence, extinction, impulsive, comparison theorem.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457586 Collocation Assessment between GEO and GSO Satellites
Authors: A. E. Emam, M. Abd Elghany
Abstract:
The change in orbit evolution between collocated satellites (X, Y) inside +/-0.09° E/W and +/- 0.07° N/S cluster, after one of these satellites is placed in an inclined orbit (satellite X) and the effect of this change in the collocation safety inside the cluster window has been studied and evaluated. Several collocation scenarios had been studied in order to adjust the location of both satellites inside their cluster to maximize the separation between them and safe the mission.Keywords: Satellite, GEO, collocation, risk assessment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2327585 An Energy Efficient Cluster Formation Protocol with Low Latency In Wireless Sensor Networks
Authors: A. Allirani, M. Suganthi
Abstract:
Data gathering is an essential operation in wireless sensor network applications. So it requires energy efficiency techniques to increase the lifetime of the network. Similarly, clustering is also an effective technique to improve the energy efficiency and network lifetime of wireless sensor networks. In this paper, an energy efficient cluster formation protocol is proposed with the objective of achieving low energy dissipation and latency without sacrificing application specific quality. The objective is achieved by applying randomized, adaptive, self-configuring cluster formation and localized control for data transfers. It involves application - specific data processing, such as data aggregation or compression. The cluster formation algorithm allows each node to make independent decisions, so as to generate good clusters as the end. Simulation results show that the proposed protocol utilizes minimum energy and latency for cluster formation, there by reducing the overhead of the protocol.Keywords: Sensor networks, Low latency, Energy sorting protocol, data processing, Cluster formation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2740584 The Effects of Different Level Cluster Tip Reduction and Foliar Boric Acid Applications on Yield and Yield Components of Italia Grape Cultivar
Authors: A. Akin
Abstract:
This study was carried out on Italia grape variety (Vitis vinifera L.) in Konya province, Turkey in 2016. The cultivar is five years old and grown on 1103 Paulsen rootstock. It was determined the effects of applications of the Control (C), 1/3 Cluster Tip Reduction (1/3 CTR), 1/6 Cluster Tip Reduction (1/6 CTR), 1/9 Cluster Tip Reduction (1/9 CTR), 1/3 CTR+Boric Acid (BA), 1/6 CTR+BA, 1/9 CTR+BA, on yield and yield components of the Italia grape variety. The results were obtained as the highest fresh grape yield (4.74 g) with 1/9 CTR+BA application; the highest cluster weight (220.08 g) with 1/3 CTR application; the highest 100 berry weight (565.85 g) with 1/9 CTR+BA application; as the highest maturity index (49.28) with 1/9 CTR+BA application; as the highest must yield (685.33 ml/kg) with 1/3 CTR+BA and (685.33 ml/kg) with 1/9 CTR+BA applications. To increase the fresh grape yield, 100 berry weight and maturity index in the Italia grape variety, the 1/9 CTR+BA application can be recommended.Keywords: Italia grape variety, boric acid, cluster tip reduction, yield, yield components.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 997583 Analysis of Diverse Clustering Tools in Data Mining
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201582 Analysis of Entrepreneurship in Industrial Cluster
Authors: Wen-Hsiang Lai
Abstract:
Except for the internal aspects of entrepreneurship (i.e.motivation, opportunity perspective and alertness), there are external aspects that affecting entrepreneurship (i.e. the industrial cluster). By comparing the machinery companies located inside and outside the industrial district, this study aims to explore the cluster effects on the entrepreneurship of companies in Taiwan machinery clusters (TMC). In this study, three factors affecting the entrepreneurship in TMC are conducted as “competition”, “embedded-ness” and “specialized knowledge”. The “competition” in the industrial cluster is defined as the competitive advantages that companies gain in form of demand effects and diversified strategies; the “embedded-ness” refers to the quality of company relations (relational embedded-ness) and ranges (structural embedded-ness) with the industry components (universities, customers and complementary) that affecting knowledge transfer and knowledge generations; the “specialized knowledge” shares theinternal knowledge within industrial clusters. This study finds that when comparing to the companieswhich are outside the cluster, the industrial cluster has positive influence on the entrepreneurship. Additionally, the factor of “relational embedded-ness” has significant impact on the entrepreneurship and affects the adaptation ability of companies in TMC. Finally, the factor of “competition” reveals partial influence on the entrepreneurship.
Keywords: Entrepreneurship, Industrial Cluster, Industrial District, Economies of Agglomerations, Taiwan Machinery Cluster (TMC).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2261581 A Functional Interpretation of Quantum Theory
Authors: Hans H. Diel
Abstract:
In this paper a functional interpretation of quantum theory (QT) with emphasis on quantum field theory (QFT) is proposed. Besides the usual statements on relations between a functions initial state and final state, a functional interpretation also contains a description of the dynamic evolution of the function. That is, it describes how things function. The proposed functional interpretation of QT/QFT has been developed in the context of the author-s work towards a computer model of QT with the goal of supporting the largest possible scope of QT concepts. In the course of this work, the author encountered a number of problems inherent in the translation of quantum physics into a computer program. He came to the conclusion that the goal of supporting the major QT concepts can only be satisfied, if the present model of QT is supplemented by a "functional interpretation" of QT/QFT. The paper describes a proposal for thatKeywords: Computability, Foundation of Quantum Mechanics, Measurement Problem, Models of Physics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2040580 Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance
Authors: S. Deelers, S. Auwatanamongkol
Abstract:
In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.Keywords: Clustering algorithm, K-means algorithm, Datapartitioning, Initial cluster centers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2865579 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.
Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2603578 Evaluation of Groundwater Quality and Its Suitability for Drinking and Agricultural Purposes Using Self-Organizing Maps
Authors: L. Belkhiri, L. Mouni, A. Tiri, T.S. Narany
Abstract:
In the present study, the self-organizing map (SOM) clustering technique was applied to identify homogeneous clusters of hydrochemical parameters in El Milia plain, Algeria, to assess the quality of groundwater for potable and agricultural purposes. The visualization of SOM-analysis indicated that 35 groundwater samples collected in the study area were classified into three clusters, which showed progressive increase in electrical conductivity from cluster one to cluster three. Samples belonging to cluster one are mostly located in the recharge zone showing hard fresh water type, however, water type gradually changed to hard-brackish type in the discharge zone, including clusters two and three. Ionic ratio studies indicated the role of carbonate rock dissolution in increases on groundwater hardness, especially in cluster one. However, evaporation and evapotranspiration are the main processes increasing salinity in cluster two and three.
Keywords: Drinking water, groundwater quality, irrigation water, self-organizing maps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1244577 Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution
Authors: Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming
Abstract:
The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.
This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.
It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.
The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.
Keywords: Balancer, Datanode, Distributed file system, Hadoop, Replicas.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4959