Search results for: multistage cluster sampling.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 902

Search results for: multistage cluster sampling.

842 Multistage Data Envelopment Analysis Model for Malmquist Productivity Index Using Grey's System Theory to Evaluate Performance of Electric Power Supply Chain in Iran

Authors: Mesbaholdin Salami, Farzad Movahedi Sobhani, Mohammad Sadegh Ghazizadeh

Abstract:

Evaluation of organizational performance is among the most important measures that help organizations and entities continuously improve their efficiency. Organizations can use the existing data and results from the comparison of units under investigation to obtain an estimation of their performance. The Malmquist Productivity Index (MPI) is an important index in the evaluation of overall productivity, which considers technological developments and technical efficiency at the same time. This article proposed a model based on the multistage MPI, considering limited data (Grey’s theory). This model can evaluate the performance of units using limited and uncertain data in a multistage process. It was applied by the electricity market manager to Iran’s electric power supply chain (EPSC), which contains uncertain data, to evaluate the performance of its actors. Results from solving the model showed an improvement in the accuracy of future performance of the units under investigation, using the Grey’s system theory. This model can be used in all case studies, in which MPI is used and there are limited or uncertain data.

Keywords: Malmquist Index, Grey's Theory, Charnes Cooper & Rhodes (CCR) Model, network data envelopment analysis, Iran electricity power chain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 509
841 Analysis of Long-Term File System Activities on Cluster Systems

Authors: Hyeyoung Cho, Sungho Kim, Sik Lee

Abstract:

I/O workload is a critical and important factor to analyze I/O pattern and to maximize file system performance. However to measure I/O workload on running distributed parallel file system is non-trivial due to collection overhead and large volume of data. In this paper, we measured and analyzed file system activities on two large-scale cluster systems which had TFlops level high performance computation resources. By comparing file system activities of 2009 with those of 2006, we analyzed the change of I/O workloads by the development of system performance and high-speed network technology.

Keywords: I/O workload, Lustre, GPFS, Cluster File System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1411
840 Development of a Clustered Network based on Unique Hop ID

Authors: Hemanth Kumar, A. R., Sudhakar G, Satyanarayana B. S.

Abstract:

In this paper, Land Marks for Unique Addressing( LMUA) algorithm is develped to generate unique ID for each and every node which leads to the formation of overlapping/Non overlapping clusters based on unique ID. To overcome the draw back of the developed LMUA algorithm, the concept of clustering is introduced. Based on the clustering concept a Land Marks for Unique Addressing and Clustering(LMUAC) Algorithm is developed to construct strictly non-overlapping clusters and classify those nodes in to Cluster Heads, Member Nodes, Gate way nodes and generating the Hierarchical code for the cluster heads to operate in the level one hierarchy for wireless communication switching. The expansion of the existing network can be performed or not without modifying the cost of adding the clusterhead is shown. The developed algorithm shows one way of efficiently constructing the

Keywords: Cluster Dimension, Cluster Basis, Metric Dimension, Metric Basis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1262
839 Multiuser Detection in CDMA Fast Fading Multipath Channel using Heuristic Genetic Algorithms

Authors: Muhammad Naeem, Syed Ismail Shah, Habibullah Jamal

Abstract:

In this paper, a simple heuristic genetic algorithm is used for Multistage Multiuser detection in fast fading environments. Multipath channels, multiple access interference (MAI) and near far effect cause the performance of the conventional detector to degrade. Heuristic Genetic algorithms, a rapidly growing area of artificial intelligence, uses evolutionary programming for initial search, which not only helps to converge the solution towards near optimal performance efficiently but also at a very low complexity as compared with optimal detector. This holds true for Additive White Gaussian Noise (AWGN) and multipath fading channels. Experimental results are presented to show the superior performance of the proposed techque over the existing methods.

Keywords: Genetic Algorithm (GA), Multiple AccessInterference (MAI), Multistage Detectors (MSD), SuccessiveInterference Cancellation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2004
838 Digital Forensics Compute Cluster: A High Speed Distributed Computing Capability for Digital Forensics

Authors: Daniel Gonzales, Zev Winkelman, Trung Tran, Ricardo Sanchez, Dulani Woods, John Hollywood

Abstract:

We have developed a distributed computing capability, Digital Forensics Compute Cluster (DFORC2) to speed up the ingestion and processing of digital evidence that is resident on computer hard drives. DFORC2 parallelizes evidence ingestion and file processing steps. It can be run on a standalone computer cluster or in the Amazon Web Services (AWS) cloud. When running in a virtualized computing environment, its cluster resources can be dynamically scaled up or down using Kubernetes. DFORC2 is an open source project that uses Autopsy, Apache Spark and Kafka, and other open source software packages. It extends the proven open source digital forensics capabilities of Autopsy to compute clusters and cloud architectures, so digital forensics tasks can be accomplished efficiently by a scalable array of cluster compute nodes. In this paper, we describe DFORC2 and compare it with a standalone version of Autopsy when both are used to process evidence from hard drives of different sizes.

Keywords: Cloud computing, cybersecurity, digital forensics, Kafka, Kubernetes, Spark.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1592
837 A New Evolutionary Algorithm for Cluster Analysis

Authors: B.Bahmani Firouzi, T. Niknam, M. Nayeripour

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Keywords: Data clustering, Hybrid evolutionary optimization algorithm, K-means algorithm, Simulated Annealing (SA), Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2230
836 Fuzzy Based Particle Swarm Optimization Routing Technique for Load Balancing in Wireless Sensor Networks

Authors: S. Balaji, E. Golden Julie, M. Rajaram, Y. Harold Robinson

Abstract:

Network lifetime improvement and uncertainty in multiple systems are the issues of wireless sensor network routing. This paper presents fuzzy based particle swarm optimization routing technique to improve the network scalability. Significantly, in the cluster formation procedure, fuzzy based system is used to solve the uncertainty and network balancing. Cluster heads play an important role to reduce the energy consumption using particle swarm optimization algorithm, the cluster head sends its information along data packets to the heads with link. The simulation results show that the presented routing protocol can perform load balancing effectively and reduce the energy consumption of cluster heads.

Keywords: Wireless sensor networks, fuzzy logic, PSO, LEACH.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1233
835 Critical Psychosocial Risk Treatment for Engineers and Technicians

Authors: R. Berglund, T. Backström, M. Bellgran

Abstract:

This study explores how management addresses psychosocial risks in seven teams of engineers and technicians in the midst of the fourth industrial revolution. The sample is from an ongoing quasi-experiment about psychosocial risk management in a manufacturing company in Sweden. Each of the seven teams belongs to one of two clusters: a positive cluster or a negative cluster. The positive cluster reports a significantly positive change in psychosocial risk levels between two time-points and the negative cluster reports a significantly negative change. The data are collected using semi-structured interviews. The results of the computer aided thematic analysis show that there are more differences than similarities when comparing the risk treatment actions taken between the two clusters. Findings show that the managers in the positive cluster use more enabling actions that foster and support formal and informal relationship building. In contrast, managers that use less enabling actions hinder the development of positive group processes and contribute negative changes in psychosocial risk levels. This exploratory study sheds some light on how management can influence significant positive and negative changes in psychosocial risk levels during a risk management process.

Keywords: Group process model, risk treatment, risk management, psychosocial.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 952
834 Influence of Ambiguity Cluster on Quality Improvement in Image Compression

Authors: Safaa Al-Ali, Ahmad Shahin, Fadi Chakik

Abstract:

Image coding based on clustering provides immediate access to targeted features of interest in a high quality decoded image. This approach is useful for intelligent devices, as well as for multimedia content-based description standards. The result of image clustering cannot be precise in some positions especially on pixels with edge information which produce ambiguity among the clusters. Even with a good enhancement operator based on PDE, the quality of the decoded image will highly depend on the clustering process. In this paper, we introduce an ambiguity cluster in image coding to represent pixels with vagueness properties. The presence of such cluster allows preserving some details inherent to edges as well for uncertain pixels. It will also be very useful during the decoding phase in which an anisotropic diffusion operator, such as Perona-Malik, enhances the quality of the restored image. This work also offers a comparative study to demonstrate the effectiveness of a fuzzy clustering technique in detecting the ambiguity cluster without losing lot of the essential image information. Several experiments have been carried out to demonstrate the usefulness of ambiguity concept in image compression. The coding results and the performance of the proposed algorithms are discussed in terms of the peak signal-tonoise ratio and the quantity of ambiguous pixels.

Keywords: Ambiguity Cluster, Anisotropic Diffusion, Fuzzy Clustering, Image Compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
833 A Balanced Cost Cluster-Heads Selection Algorithm for Wireless Sensor Networks

Authors: Ouadoudi Zytoune, Youssef Fakhri, Driss Aboutajdine

Abstract:

This paper focuses on reducing the power consumption of wireless sensor networks. Therefore, a communication protocol named LEACH (Low-Energy Adaptive Clustering Hierarchy) is modified. We extend LEACHs stochastic cluster-head selection algorithm by a modifying the probability of each node to become cluster-head based on its required energy to transmit to the sink. We present an efficient energy aware routing algorithm for the wireless sensor networks. Our contribution consists in rotation selection of clusterheads considering the remoteness of the nodes to the sink, and then, the network nodes residual energy. This choice allows a best distribution of the transmission energy in the network. The cluster-heads selection algorithm is completely decentralized. Simulation results show that the energy is significantly reduced compared with the previous clustering based routing algorithm for the sensor networks.

Keywords: Wireless Sensor Networks, Energy efficiency, WirelessCommunications, Clustering-based algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2598
832 Adaptive Sampling Algorithm for ANN-based Performance Modeling of Nano-scale CMOS Inverter

Authors: Dipankar Dhabak, Soumya Pandit

Abstract:

This paper presents an adaptive technique for generation of data required for construction of artificial neural network-based performance model of nano-scale CMOS inverter circuit. The training data are generated from the samples through SPICE simulation. The proposed algorithm has been compared to standard progressive sampling algorithms like arithmetic sampling and geometric sampling. The advantages of the present approach over the others have been demonstrated. The ANN predicted results have been compared with actual SPICE results. A very good accuracy has been obtained.

Keywords: CMOS Inverter, Nano-scale, Adaptive Sampling, ArtificialNeural Network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1569
831 Skyline Extraction using a Multistage Edge Filtering

Authors: Byung-Ju Kim, Jong-Jin Shin, Hwa-Jin Nam, Jin-Soo Kim

Abstract:

Skyline extraction in mountainous images can be used for navigation of vehicles or UAV(unmanned air vehicles), but it is very hard to extract skyline shape because of clutters like clouds, sea lines and field borders in images. We developed the edge-based skyline extraction algorithm using a proposed multistage edge filtering (MEF) technique. In this method, characteristics of clutters in the image are first defined and then the lines classified as clutters are eliminated by stages using the proposed MEF technique. After this processing, we select the last line using skyline measures among the remained lines. This proposed algorithm is robust under severe environments with clutters and has even good performance for infrared sensor images with a low resolution. We tested this proposed algorithm for images obtained in the field by an infrared camera and confirmed that the proposed algorithm produced a better performance and faster processing time than conventional algorithms.

Keywords: MEF, mountainous image, navigation, skyline

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1827
830 Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation

Authors: Chien-Min Chu, Bor-Wen Tsai, Kang-Tsung Chang

Abstract:

Landslide susceptibility map delineates the potential zones for landslide occurrence. Previous works have applied multivariate methods and neural networks for mapping landslide susceptibility. This study proposed a new approach to integrate decision tree model and spatial cluster statistic for assessing landslide susceptibility spatially. A total of 2057 landslide cells were digitized for developing the landslide decision tree model. The relationships of landslides and instability factors were explicitly represented by using tree graphs in the model. The local Getis-Ord statistics were used to cluster cells with high landslide probability. The analytic result from the local Getis-Ord statistics was classed to create a map of landslide susceptibility zones. The map was validated using new landslide data with 482 cells. Results of validation show an accuracy rate of 86.1% in predicting new landslide occurrence. This indicates that the proposed approach is useful for improving landslide susceptibility mapping.

Keywords: Landslide susceptibility Zonation, Decision treemodel, Spatial cluster, Local Getis-Ord statistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
829 Creation of Greater Mekong Subregion Regional Competitiveness through Cluster Mapping

Authors: Danuvasin Charoen

Abstract:

This research investigates cluster development in the area called the Greater Mekong Subregion (GMS), which consists of Thailand, the People’s Republic of China (PRC), the Yunnan Province and Guangxi Zhuang Autonomous Region, Myanmar, the Lao People’s Democratic Republic (Lao PDR), Cambodia, and Vietnam. The study utilized Porter’s competitiveness theory and the cluster mapping approach to analyze the competitiveness of the region. The data collection consists of interviews, focus groups, and the analysis of secondary data. The findings identify some evidence of cluster development in the GMS; however, there is no clear indication of collaboration among the components in the clusters. GMS clusters tend to be stand-alone. The clusters in Vietnam, Lao PDR, Myanmar, and Cambodia tend to be labor intensive, whereas the clusters in Thailand and the PRC (Yunnan) have the potential to successfully develop into innovative clusters. The collaboration and integration among the clusters in the GMS area are promising, though it could take a long time. The most likely relationship between the GMS countries could be, for example, suppliers of the low-end, labor-intensive products will be located in the low income countries such as Myanmar, Lao PDR, and Cambodia, and these countries will be providing input materials for innovative clusters in the middle income countries such as Thailand and the PRC.

Keywords: Greater Mekong Subregion, competitiveness, cluster, development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1014
828 Idiopathic Constipation can be Subdivided in Clinical Subtypes: Data Mining by Cluster Analysis on a Population based Study

Authors: Mauro Giacomini, Stefania Bertone, Carlo Mansi, Pietro Dulbecco, Vincenzo Savarino

Abstract:

The prevalence of non organic constipation differs from country to country and the reliability of the estimate rates is uncertain. Moreover, the clinical relevance of subdividing the heterogeneous functional constipation disorders into pre-defined subgroups is largely unknown.. Aim: to estimate the prevalence of constipation in a population-based sample and determine whether clinical subgroups can be identified. An age and gender stratified sample population from 5 Italian cities was evaluated using a previously validated questionnaire. Data mining by cluster analysis was used to determine constipation subgroups. Results: 1,500 complete interviews were obtained from 2,083 contacted households (72%). Self-reported constipation correlated poorly with symptombased constipation found in 496 subjects (33.1%). Cluster analysis identified four constipation subgroups which correlated to subgroups identified according to pre-defined symptom criteria. Significant differences in socio-demographics and lifestyle were observed among subgroups.

Keywords: Cluster analysis, constipation, data mining, statistical analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
827 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
826 Innovation Development of Food Market of Kazakhstan

Authors: G.B. Nurlikhina, G.N. Azhimetova, R. M. Ashimova

Abstract:

Currently, one of the main directions is developing of development based on the clustering of economic operations of Kazakhstan, providing for the organization and concentration of production capacity in one region or the most optimal system. In the modern economic literature clustering is regarded as one of the most effective tools to ensure competitive businesses, and improve their business itself.

Keywords: A cluster, food market, innovation cluster.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2204
825 UDCA: An Energy Efficient Clustering Algorithm for Wireless Sensor Network

Authors: Boregowda S.B., Hemanth Kumar A.R. Babu N.V, Puttamadappa C., And H.S Mruthyunjaya

Abstract:

In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.

Keywords: Clustering algorithms, Cluster head, Energy consumption, Sensor nodes, and Wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2331
824 Specific Frequency of Globular Clusters in Different Galaxy Types

Authors: Ahmed H. Abdullah, Pavel Kroupa

Abstract:

Globular clusters (GC) are important objects for tracing the early evolution of a galaxy. We study the correlation between the cluster population and the global properties of the host galaxy. We found that the correlation between cluster population (NGC) and the baryonic mass (Mb) of the host galaxy are best described as 10 −5.6038Mb. In order to understand the origin of the U -shape relation between the GC specific frequency (SN) and Mb (caused by the high value of SN for dwarfs galaxies and giant ellipticals and a minimum SN for intermediate mass galaxies≈ 1010M), we derive a theoretical model for the specific frequency (SNth). The theoretical model for SNth is based on the slope of the power-law embedded cluster mass function (β) and different time scale (Δt) of the forming galaxy. Our results show a good agreement between the observation and the model at a certain β and Δt. The model seems able to reproduce higher value of SNth of β = 1.5 at the midst formation time scale.

Keywords: Galaxies, dwarf, globular cluster, specific frequency, formation time scale.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 754
823 Assessment of EU Competitiveness Factors by Multivariate Methods

Authors: L. Melecký

Abstract:

Measurement of competitiveness between countries or regions is an important topic of many economic analysis and scientific papers. In European Union (EU), there is no mainstream approach of competitiveness evaluation and measuring. There are many opinions and methods of measurement and evaluation of competitiveness between states or regions at national and European level. The methods differ in structure of using the indicators of competitiveness and ways of their processing. The aim of the paper is to analyze main sources of competitive potential of the EU Member States with the help of Factor analysis (FA) and to classify the EU Member States to homogeneous units (clusters) according to the similarity of selected indicators of competitiveness factors by Cluster analysis (CA) in reference years 2000 and 2011. The theoretical part of the paper is devoted to the fundamental bases of competitiveness and the methodology of FA and CA methods. The empirical part of the paper deals with the evaluation of competitiveness factors in the EU Member States and cluster comparison of evaluated countries by cluster analysis. 

Keywords: Competitiveness, cluster analysis, EU, factor analysis, multivariate methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1993
822 Sampling Effects on Secondary Voltage Control of Microgrids Based on Network of Multiagent

Authors: M. J. Park, S. H. Lee, C. H. Lee, O. M. Kwon

Abstract:

This paper studies a secondary voltage control framework of the microgrids based on the consensus for a communication network of multiagent. The proposed control is designed by the communication network with one-way links. The communication network is modeled by a directed graph. At this time, the concept of sampling is considered as the communication constraint among each distributed generator in the microgrids. To analyze the sampling effects on the secondary voltage control of the microgrids, by using Lyapunov theory and some mathematical techniques, the sufficient condition for such problem will be established regarding linear matrix inequality (LMI). Finally, some simulation results are given to illustrate the necessity of the consideration of the sampling effects on the secondary voltage control of the microgrids.

Keywords: Microgrids, secondary control, multiagent, sampling, LMI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1413
821 Fuzzy Clustering of Categorical Attributes and its Use in Analyzing Cultural Data

Authors: George E. Tsekouras, Dimitris Papageorgiou, Sotiris Kotsiantis, Christos Kalloniatis, Panagiotis Pintelas

Abstract:

We develop a three-step fuzzy logic-based algorithm for clustering categorical attributes, and we apply it to analyze cultural data. In the first step the algorithm employs an entropy-based clustering scheme, which initializes the cluster centers. In the second step we apply the fuzzy c-modes algorithm to obtain a fuzzy partition of the data set, and the third step introduces a novel cluster validity index, which decides the final number of clusters.

Keywords: Categorical data, cultural data, fuzzy logic clustering, fuzzy c-modes, cluster validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
820 Ratio Type Estimators of the Population Mean Based on Ranked Set Sampling

Authors: Said Ali Al-Hadhrami

Abstract:

Ranked set sampling (RSS) was first suggested to increase the efficiency of the population mean. It has been shown that this method is highly beneficial to the estimation based on simple random sampling (SRS). There has been considerable development and many modifications were done on this method. When a concomitant variable is available, ratio estimation based on ranked set sampling was proposed. This ratio estimator is more efficient than that based on SRS. In this paper some ratio type estimators of the population mean based on RSS are suggested. These estimators are found to be more efficient than the estimators of similar form using simple random sample.

Keywords: Bias, Efficiency, Ranked Set Sampling, Ratio Type Estimator

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1336
819 Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Keywords: Data Mining, Fuzzy Clustering, Relational Clustering, Medoid-Based Clustering, Cluster Analysis, Unsupervised Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2356
818 Proposal to Increase the Efficiency, Reliability and Safety of the Centre of Data Collection Management and Their Evaluation Using Cluster Solutions

Authors: Martin Juhas, Bohuslava Juhasova, Igor Halenar, Andrej Elias

Abstract:

This article deals with the possibility of increasing efficiency, reliability and safety of the system for teledosimetric data collection management and their evaluation as a part of complex study for activity “Research of data collection, their measurement and evaluation with mobile and autonomous units” within project “Research of monitoring and evaluation of non-standard conditions in the area of nuclear power plants”. Possible weaknesses in existing system are identified. A study of available cluster solutions with possibility of their deploying to analysed system is presented

Keywords: Teledosimetric data, efficiency, reliability, safety, cluster solution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516
817 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.

Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1869
816 On the Comparison of Several Goodness of Fit tests under Simple Random Sampling and Ranked Set Sampling

Authors: F. Azna A. Shahabuddin, Kamarulzaman Ibrahim, Abdul Aziz Jemain

Abstract:

Many works have been carried out to compare the efficiency of several goodness of fit procedures for identifying whether or not a particular distribution could adequately explain a data set. In this paper a study is conducted to investigate the power of several goodness of fit tests such as Kolmogorov Smirnov (KS), Anderson-Darling(AD), Cramer- von- Mises (CV) and a proposed modification of Kolmogorov-Smirnov goodness of fit test which incorporates a variance stabilizing transformation (FKS). The performances of these selected tests are studied under simple random sampling (SRS) and Ranked Set Sampling (RSS). This study shows that, in general, the Anderson-Darling (AD) test performs better than other GOF tests. However, there are some cases where the proposed test can perform as equally good as the AD test.

Keywords: Empirical distribution function, goodness-of-fit, order statistics, ranked set sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672
815 Communities of Ammonia-oxidizing Archaea and Bacteria in Enriched Nitrifying Activated Sludge

Authors: Puntipar Sonthiphand, Tawan Limpiyakorn

Abstract:

In this study, communities of ammonia-oxidizing archaea (AOA) and ammonia-oxidizing bacteria (AOB) in nitrifying activated sludge (NAS) prepared by enriching sludge from a municipal wastewater treatment plant in three continuous-flow reactors receiving an inorganic medium containing different ammonium concentrations of 2, 10, and 30 mM NH4 +-N (NAS2, NAS10, and NAS30, respectively) were investigated using molecular analysis. Results suggested that almost all AOA clones from NAS2, NAS10, and NAS30 fell into the same AOA cluster and AOA communities in NAS2 and NAS10 were more diverse than those of NAS30. In contrast to AOA, AOB communities obviously shifted from the seed sludge to enriched NASs and in each enriched NAS, communities of AOB varied particularly. The seed sludge contained members of N. communis cluster and N. oligotropha cluster. After it was enriched under various ammonium loads, members of N. communis cluster disappeared from all enriched NASs. AOB with high affinity to ammonia presented in NAS 2, AOB with low affinity to ammonia presented in NAS 30, and both types of AOB survived in NAS 10. These demonstrated that ammonium load significantly influenced AOB communities, but not AOA communities in enriched NASs.

Keywords: ammonia-oxidizing bacteria, ammonia-oxidizingarchaea, nitrifying activated sludge.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543
814 New Product-Type Estimators for the Population Mean Using Quartiles of the Auxiliary Variable

Authors: Amer Ibrahim Falah Al-Omari

Abstract:

In this paper, we suggest new product-type estimators for the population mean of the variable of interest exploiting the first or the third quartile of the auxiliary variable. We obtain mean square error equations and the bias for the estimators. We study the properties of these estimators using simple random sampling (SRS) and ranked set sampling (RSS) methods. It is found that, SRS and RSS produce approximately unbiased estimators of the population mean. However, the RSS estimators are more efficient than those obtained using SRS based on the same number of measured units for all values of the correlation coefficient.

Keywords: Product estimator, auxiliary variable, simple random sampling, extreme ranked set sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1486
813 Sampling of Variables in Discrete-Event Simulation using the Example of Inventory Evolutions in Job-Shop-Systems Based on Deterministic and Non-Deterministic Data

Authors: Bernd Scholz-Reiter, Christian Toonen, Jan Topi Tervo, Dennis Lappe

Abstract:

Time series analysis often requires data that represents the evolution of an observed variable in equidistant time steps. In order to collect this data sampling is applied. While continuous signals may be sampled, analyzed and reconstructed applying Shannon-s sampling theorem, time-discrete signals have to be dealt with differently. In this article we consider the discrete-event simulation (DES) of job-shop-systems and study the effects of different sampling rates on data quality regarding completeness and accuracy of reconstructed inventory evolutions. At this we discuss deterministic as well as non-deterministic behavior of system variables. Error curves are deployed to illustrate and discuss the sampling rate-s impact and to derive recommendations for its wellfounded choice.

Keywords: discrete-event simulation, job-shop-system, sampling rate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777