Search results for: multistage cluster sampling.

872 Analysis of Diverse Clustering Tools in Data Mining

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.

Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2148

871 Analysis of Entrepreneurship in Industrial Cluster

Authors: Wen-Hsiang Lai

Abstract:

Except for the internal aspects of entrepreneurship (i.e.motivation, opportunity perspective and alertness), there are external aspects that affecting entrepreneurship (i.e. the industrial cluster). By comparing the machinery companies located inside and outside the industrial district, this study aims to explore the cluster effects on the entrepreneurship of companies in Taiwan machinery clusters (TMC). In this study, three factors affecting the entrepreneurship in TMC are conducted as “competition”, “embedded-ness” and “specialized knowledge”. The “competition” in the industrial cluster is defined as the competitive advantages that companies gain in form of demand effects and diversified strategies; the “embedded-ness” refers to the quality of company relations (relational embedded-ness) and ranges (structural embedded-ness) with the industry components (universities, customers and complementary) that affecting knowledge transfer and knowledge generations; the “specialized knowledge” shares theinternal knowledge within industrial clusters. This study finds that when comparing to the companieswhich are outside the cluster, the industrial cluster has positive influence on the entrepreneurship. Additionally, the factor of “relational embedded-ness” has significant impact on the entrepreneurship and affects the adaptation ability of companies in TMC. Finally, the factor of “competition” reveals partial influence on the entrepreneurship.

Keywords: Entrepreneurship, Industrial Cluster, Industrial District, Economies of Agglomerations, Taiwan Machinery Cluster (TMC).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2212

870 Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

Authors: S. Deelers, S. Auwatanamongkol

Abstract:

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Keywords: Clustering algorithm, K-means algorithm, Datapartitioning, Initial cluster centers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2813

869 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2553

868 Evaluation of Groundwater Quality and Its Suitability for Drinking and Agricultural Purposes Using Self-Organizing Maps

Authors: L. Belkhiri, L. Mouni, A. Tiri, T.S. Narany

Abstract:

In the present study, the self-organizing map (SOM) clustering technique was applied to identify homogeneous clusters of hydrochemical parameters in El Milia plain, Algeria, to assess the quality of groundwater for potable and agricultural purposes. The visualization of SOM-analysis indicated that 35 groundwater samples collected in the study area were classified into three clusters, which showed progressive increase in electrical conductivity from cluster one to cluster three. Samples belonging to cluster one are mostly located in the recharge zone showing hard fresh water type, however, water type gradually changed to hard-brackish type in the discharge zone, including clusters two and three. Ionic ratio studies indicated the role of carbonate rock dissolution in increases on groundwater hardness, especially in cluster one. However, evaporation and evapotranspiration are the main processes increasing salinity in cluster two and three.

Keywords: Drinking water, groundwater quality, irrigation water, self-organizing maps.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1180

867 Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution

Authors: Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming

Abstract:

The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.

This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.

It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.

The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.

Keywords: Balancer, Datanode, Distributed file system, Hadoop, Replicas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4892

866 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939

865 Modeling Parametric Vibration of Multistage Gear Systems as a Tool for Design Optimization

Authors: James Kuria, John Kihiu

Abstract:

This work presents a numerical model developed to simulate the dynamics and vibrations of a multistage tractor gearbox. The effect of time varying mesh stiffness, time varying frictional torque on the gear teeth, lateral and torsional flexibility of the shafts and flexibility of the bearings were included in the model. The model was developed by using the Lagrangian method, and it was applied to study the effect of three design variables on the vibration and stress levels on the gears. The first design variable, module, had little effect on the vibration levels but a higher module resulted to higher bending stress levels. The second design variable, pressure angle, had little effect on the vibration levels, but had a strong effect on the stress levels on the pinion of a high reduction ratio gear pair. A pressure angle of 25o resulted to lower stress levels for a pinion with 14 teeth than a pressure angle of 20o. The third design variable, contact ratio, had a very strong effect on both the vibration levels and bending stress levels. Increasing the contact ratio to 2.0 reduced both the vibration levels and bending stress levels significantly. For the gear train design used in this study, a module of 2.5 and contact ratio of 2.0 for the various meshes was found to yield the best combination of low vibration levels and low bending stresses. The model can therefore be used as a tool for obtaining the optimum gear design parameters for a given multistage spur gear train.

Keywords: bending stress levels, frictional torque, gear designparameters, mesh stiffness, multistage gear train, vibration levels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2528

864 Efficient Alias-free Level Crossing Sampling

Authors: Negar Riazifar, Nigel G. Stocks

Abstract:

This paper proposes strategies in level crossing (LC) sampling and reconstruction that provide alias-free high-fidelity signal reconstruction for speech signals without exponentially increasing sample number with increasing bit-depth. We introduce methods in LC sampling that reduce the sampling rate close to the Nyquist frequency even for large bit-depth. The results indicate that larger variation in the sampling intervals leads to alias-free sampling scheme; this is achieved by either reducing the bit-depth or adding a jitter to the system for high bit-depths. In conjunction with windowing, the signal is reconstructed from the LC samples using an efficient Toeplitz reconstruction algorithm.

Keywords: Alias-free, level crossing sampling, spectrum, trigonometric polynomial.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 244

863 A Multistage Sulphidisation Flotation Procedure for a Low Grade Malachite Copper Ore

Authors: Tebogo P. Phetla, Edison Muzenda

Abstract:

This study was carried out to develop a flotation procedure for an oxide copper ore from a Region in Central Africa for producing an 18% copper concentrate for downstream processing at maximum recovery from a 4% copper feed grade. The copper recoveries achieved from the test work were less than 50% despite changes in reagent conditions (multistage sulphidisation, use of RCA emulsion and mixture, use of AM 2, etc). The poor recoveries were attributed to the mineralogy of the ore from which copper silicates accounted for approximately 70% (mass) of the copper minerals in the ore. These can be complex and difficult to float using conventional flotation methods. Best results were obtained using basic sulphidisation procedures, a high flotation temperature and extended flotation residence time.

Keywords: Froth flotation, Sulphidisation, Copper oxide ore, Mineralogy, Recovery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5769

862 Cluster Analysis of Customer Churn in Telecom Industry

Authors: Abbas Al-Refaie

Abstract:

The research examines the factors that affect customer churn (CC) in the Jordanian telecom industry. A total of 700 surveys were distributed. Cluster analysis revealed three main clusters. Results showed that CC and customer satisfaction (CS) were the key determinants in forming the three clusters. In two clusters, the center values of CC were high, indicating that the customers were loyal and SC was expensive and time- and energy-consuming. Still, the mobile service provider (MSP) should enhance its communication (COM), and value added services (VASs), as well as customer complaint management systems (CCMS). Finally, for the third cluster the center of the CC indicates a poor level of loyalty, which facilitates customers churn to another MSP. The results of this study provide valuable feedback for MSP decision makers regarding approaches to improving their performance and reducing CC.

Keywords: Cluster analysis, telecom industry, switching cost, customer churn.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2476

861 Game Theory Based Diligent Energy Utilization Algorithm for Routing in Wireless Sensor Network

Authors: X. Mercilin Raajini, R. Raja Kumar, P. Indumathi, V. Praveen

Abstract:

Many cluster based routing protocols have been proposed in the field of wireless sensor networks, in which a group of nodes are formed as clusters. A cluster head is selected from one among those nodes based on residual energy, coverage area, number of hops and that cluster-head will perform data gathering from various sensor nodes and forwards aggregated data to the base station or to a relay node (another cluster-head), which will forward the packet along with its own data packet to the base station. Here a Game Theory based Diligent Energy Utilization Algorithm (GTDEA) for routing is proposed. In GTDEA, the cluster head selection is done with the help of game theory, a decision making process, that selects a cluster-head based on three parameters such as residual energy (RE), Received Signal Strength Index (RSSI) and Packet Reception Rate (PRR). Finding a feasible path to the destination with minimum utilization of available energy improves the network lifetime and is achieved by the proposed approach. In GTDEA, the packets are forwarded to the base station using inter-cluster routing technique, which will further forward it to the base station. Simulation results reveal that GTDEA improves the network performance in terms of throughput, lifetime, and power consumption.

Keywords: Cluster head, Energy utilization, Game Theory, LEACH, Sensor network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1853

860 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2068

859 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: K-Means, Data Clustering, Cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3240

858 Two-Photon Ionization of Silver Clusters

Authors: V. Paployan, K. Madoyan, A. Melikyan, H. Minassian

Abstract:

In this paper, we calculate the two-photon ionization (TPI) cross-section for pump-probe scheme in Ag neutral cluster. The pump photon energy is assumed to be close to the surface plasmon (SP) energy of cluster in dielectric media. Due to this choice, the pump wave excites collective oscillations of electrons-SP and the probe wave causes ionization of the cluster. Since the interband transition energy in Ag exceeds the SP resonance energy, the main contribution into the TPI comes from the latter. The advantage of Ag clusters as compared to the other noble metals is that the SP resonance in silver cluster is much sharper because of peculiarities of its dielectric function. The calculations are performed by separating the coordinates of electrons corresponding to the collective oscillations and the individual motion that allows taking into account the resonance contribution of excited SP oscillations. It is shown that the ionization cross section increases by two orders of magnitude if the energy of the pump photon matches the surface plasmon energy in the cluster.

Keywords: Resonance enhancement, silver clusters, surface plasmon, two-photon ionization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1432

857 Application of MADM in Identifying the Transmission Rate of Dengue fever: A Case Study of Shah Alam, Malaysia

Authors: Nuraini Yusoff, Harun Budin, Salemah Ismail

Abstract:

Identifying parameters in an epidemic model is one of the important aspect of modeling. In this paper, we suggest a method to identify the transmission rate by using the multistage Adomian decomposition method. As a case study, we use the data of the reported dengue fever cases in the city of Shah Alam, Malaysia. The result obtained fairly represents the actual situation. However, in the SIR model, this method serves as an alternative in parameter identification and enables us to make necessary analysis for a smaller interval.

Keywords: dengue fever, multistage Adomian decomposition method, Shah Alam, SIR model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2153

856 Design of Bayesian MDS Sampling Plan Based on the Process Capability Index

Authors: Davood Shishebori, Mohammad Saber Fallah Nezhad, Sina Seifi

Abstract:

In this paper, a variable multiple dependent state (MDS) sampling plan is developed based on the process capability index using Bayesian approach. The optimal parameters of the developed sampling plan with respect to constraints related to the risk of consumer and producer are presented. Two comparison studies have been done. First, the methods of double sampling model, sampling plan for resubmitted lots and repetitive group sampling (RGS) plan are elaborated and average sample numbers of the developed MDS plan and other classical methods are compared. A comparison study between the developed MDS plan based on Bayesian approach and the exact probability distribution is carried out.

Keywords: MDS sampling plan, RGS plan, sampling plan for resubmitted lots, process capability index, average sample number, Bayesian approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 964

855 Network of Coupled Stochastic Oscillators and One-way Quantum Computations

Authors: Eugene Grichuk, Margarita Kuzmina, Eduard Manykin

Abstract:

A network of coupled stochastic oscillators is proposed for modeling of a cluster of entangled qubits that is exploited as a computation resource in one-way quantum computation schemes. A qubit model has been designed as a stochastic oscillator formed by a pair of coupled limit cycle oscillators with chaotically modulated limit cycle radii and frequencies. The qubit simulates the behavior of electric field of polarized light beam and adequately imitates the states of two-level quantum system. A cluster of entangled qubits can be associated with a beam of polarized light, light polarization degree being directly related to cluster entanglement degree. Oscillatory network, imitating qubit cluster, is designed, and system of equations for network dynamics has been written. The constructions of one-qubit gates are suggested. Changing of cluster entanglement degree caused by measurements can be exactly calculated.

Keywords: network of stochastic oscillators, one-way quantumcomputations, a beam of polarized light.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1358

854 Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Authors: Lai Lai Win Kyi, Nay Min Tun

Abstract:

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Keywords: Cluster of Workstations, Parallel sorting algorithms, performance analysis, parallel computing and MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1435

853 Investigating the Efficiency of Stratified Double Median Ranked Set Sample for Estimating the Population Mean

Authors: Mahmoud I. Syam

Abstract:

Stratified double median ranked set sampling (SDMRSS) method is suggested for estimating the population mean. The SDMRSS is compared with the simple random sampling (SRS), stratified simple random sampling (SSRS), and stratified ranked set sampling (SRSS). It is shown that SDMRSS estimator is an unbiased of the population mean and more efficient than SRS, SSRS, and SRSS. Also, by SDMRSS, we can increase the efficiency of mean estimator for specific value of the sample size. SDMRSS is applied on real life examples, and the results of the example agreed the theoretical results.

Keywords: Efficiency, double ranked set sampling, median ranked set sampling, ranked set sampling, stratified.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 911

852 Parallelization and Optimization of SIFT Feature Extraction on Cluster System

Authors: Mingling Zheng, Zhenlong Song, Ke Xu, Hengzhu Liu

Abstract:

Scale Invariant Feature Transform (SIFT) has been widely applied, but extracting SIFT feature is complicated and time-consuming. In this paper, to meet the demand of the real-time applications, SIFT is parallelized and optimized on cluster system, which is named pSIFT. Redundancy storage and communication are used for boundary data to improve the performance, and before representation of feature descriptor, data reallocation is adopted to keep load balance in pSIFT. Experimental results show that pSIFT achieves good speedup and scalability.

Keywords: cluster, image matching, parallelization and optimization, SIFT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1821

851 Color Image Segmentation Using Competitive and Cooperative Learning Approach

Authors: Yinggan Tang, Xinping Guan

Abstract:

Color image segmentation can be considered as a cluster procedure in feature space. k-means and its adaptive version, i.e. competitive learning approach are powerful tools for data clustering. But k-means and competitive learning suffer from several drawbacks such as dead-unit problem and need to pre-specify number of cluster. In this paper, we will explore to use competitive and cooperative learning approach to perform color image segmentation. In competitive and cooperative learning approach, seed points not only compete each other, but also the winner will dynamically select several nearest competitors to form a cooperative team to adapt to the input together, finally it can automatically select the correct number of cluster and avoid the dead-units problem. Experimental results show that CCL can obtain better segmentation result.

Keywords: Color image segmentation, competitive learning, cluster, k-means algorithm, competitive and cooperative learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1570

850 Routing Algorithm for a Clustered Network

Authors: Hemanth KumarA.R, Sudhakara G., Satyanarayana B.S.

Abstract:

The Cluster Dimension of a network is defined as, which is the minimum cardinality of a subset S of the set of nodes having the property that for any two distinct nodes x and y, there exist the node Si, s2 (need not be distinct) in S such that ld(x,s1) — d(y, s1)1 > 1 and d(x,s2) < d(x,$) for all s E S — {s2}. In this paper, strictly non overlap¬ping clusters are constructed. The concept of LandMarks for Unique Addressing and Clustering (LMUAC) routing scheme is developed. With the help of LMUAC routing scheme, It is shown that path length (upper bound)PLN,d < PLD, Maximum memory space requirement for the networkMSLmuAc(Az) < MSEmuAc < MSH3L < MSric and Maximum Link utilization factor MLLMUAC(i=3) < MLLMUAC(z03) < M Lc

Keywords: Metric dimension, Cluster dimension, Cluster.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1188

849 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2406

848 Off-Line Detection of “Pannon Wheat” Milling Fractions by Near-Infrared Spectroscopic Methods

Authors: E. Izsó, M. Bartalné-Berceli, Sz. Gergely, A. Salgó

Abstract:

The aim of this investigation is to elaborate nearinfrared methods for testing and recognition of chemical components and quality in “Pannon wheat” allied (i.e. true to variety or variety identified) milling fractions as well as to develop spectroscopic methods following the milling processes and evaluate the stability of the milling technology by different types of milling products and according to sampling times, respectively. These wheat categories produced under industrial conditions where samples were collected versus sampling time and maximum or minimum yields. The changes of the main chemical components (such as starch, protein, lipid) and physical properties of fractions (particle size) were analysed by dispersive spectrophotometers using visible (VIS) and near-infrared (NIR) regions of the electromagnetic radiation. Close correlation were obtained between the data of spectroscopic measurement techniques processed by various chemometric methods (e.g. principal component analysis [PCA], cluster analysis [CA]) and operation condition of milling technology. It is obvious that NIR methods are able to detect the deviation of the yield parameters and differences of the sampling times by a wide variety of fractions, respectively. NIR technology can be used in the sensitive monitoring of milling technology.

Keywords: Allied wheat fractions, CA, milling process, nearinfrared spectroscopy, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1652

847 A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India

Authors: Nida Rizvi, Deeksha Katyal, Varun Joshi

Abstract:

River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand, and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Keywords: Cluster analysis, multivariate statistical technique, river Hindon, water Quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3753

846 Marketing Segmentation of Students Willing to Study Abroad based on Cluster Analysis

Authors: Kamila Tislerova, Marta Zambochova

Abstract:

Market segmentation is one of the most fundamental strategic marketing concepts. The better the segment which is chosen for targeting by a particular organisation, the more successful the organisation is assumed to be in the marketplace. Also higher education institutions have to improve their marketing tools for attracting foreign students, particularly when demanding tuition fees. This contribution aims at demonstrating the proper usage of the cluster analysis for segmentation (represented by students' willingness to study abroad) and also, based on large international survey, offers some practical marketing implications.

Keywords: Market Segmentation, Students' Preferences, Study Abroad, Cluster Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2174

845 Investigation of the Effects of Sampling Frequency on the THD of 3-Phase Inverters Using Space Vector Modulation

Authors: Khattab Ibrahim Al Qaisi, Nicholas Bowring

Abstract:

This paper presents the simulation results of the effects of sampling frequency on the total harmonic distortion (THD) of three-phase inverters using the space vector pulse width modulation (SVPWM) and space vector control (SVC) algorithms. The relationship between the variables was studied using curve fitting techniques, and it has been shown that, for 50 Hz inverters, there is an exponential relation between the sampling frequency and THD up to around 8500 Hz, beyond which the performance of the model becomes irregular, and there is an negative exponential relation between the sampling frequency and the marginal improvement to the THD. It has also been found that the performance of SVPWM is better than that of SVC with the same sampling frequency in most frequency range, including the range where the performance of the former is irregular.

Keywords: SVPWM, THD, DC-AC Inverter, Sampling Frequency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2938

844 Formosa3: A Cloud-Enabled HPC Cluster in NCHC

Authors: Chin-Hung Li, Te-Ming Chen, Ying-Chuan Chen, Shuen-Tai Wang

Abstract:

This paper proposes a new approach to offer a private cloud service in HPC clusters. In particular, our approach relies on automatically scheduling users- customized environment request as a normal job in batch system. After finishing virtualization request jobs, those guest operating systems will dismiss so that compute nodes will be released again for computing. We present initial work on the innovative integration of HPC batch system and virtualization tools that aims at coexistence such that they suffice for meeting the minimizing interference required by a traditional HPC cluster. Given the design of initial infrastructure, the proposed effort has the potential to positively impact on synergy model. The results from the experiment concluded that goal for provisioning customized cluster environment indeed can be fulfilled by using virtual machines, and efficiency can be improved with proper setup and arrangements.

Keywords: Cloud Computing, HPC Cluster, Private Cloud, Virtualization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1990

843 Image Compression Using Hybrid Vector Quantization

Authors: S.Esakkirajan, T. Veerakumar, V. Senthil Murugan, P.Navaneethan

Abstract:

In this paper, image compression using hybrid vector quantization scheme such as Multistage Vector Quantization (MSVQ) and Pyramid Vector Quantization (PVQ) are introduced. A combined MSVQ and PVQ are utilized to take advantages provided by both of them. In the wavelet decomposition of the image, most of the information often resides in the lowest frequency subband. MSVQ is applied to significant low frequency coefficients. PVQ is utilized to quantize the coefficients of other high frequency subbands. The wavelet coefficients are derived using lifting scheme. The main aim of the proposed scheme is to achieve high compression ratio without much compromise in the image quality. The results are compared with the existing image compression scheme using MSVQ.

Keywords: Lifting Scheme, Multistage Vector Quantization and Pyramid Vector Quantization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891