Search results for: cluster analysis
8863 Digital Forensics Compute Cluster: A High Speed Distributed Computing Capability for Digital Forensics
Authors: Daniel Gonzales, Zev Winkelman, Trung Tran, Ricardo Sanchez, Dulani Woods, John Hollywood
Abstract:
We have developed a distributed computing capability, Digital Forensics Compute Cluster (DFORC2) to speed up the ingestion and processing of digital evidence that is resident on computer hard drives. DFORC2 parallelizes evidence ingestion and file processing steps. It can be run on a standalone computer cluster or in the Amazon Web Services (AWS) cloud. When running in a virtualized computing environment, its cluster resources can be dynamically scaled up or down using Kubernetes. DFORC2 is an open source project that uses Autopsy, Apache Spark and Kafka, and other open source software packages. It extends the proven open source digital forensics capabilities of Autopsy to compute clusters and cloud architectures, so digital forensics tasks can be accomplished efficiently by a scalable array of cluster compute nodes. In this paper, we describe DFORC2 and compare it with a standalone version of Autopsy when both are used to process evidence from hard drives of different sizes.Keywords: Cloud computing, cybersecurity, digital forensics, Kafka, Kubernetes, Spark.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16588862 Fuzzy Based Particle Swarm Optimization Routing Technique for Load Balancing in Wireless Sensor Networks
Authors: S. Balaji, E. Golden Julie, M. Rajaram, Y. Harold Robinson
Abstract:
Network lifetime improvement and uncertainty in multiple systems are the issues of wireless sensor network routing. This paper presents fuzzy based particle swarm optimization routing technique to improve the network scalability. Significantly, in the cluster formation procedure, fuzzy based system is used to solve the uncertainty and network balancing. Cluster heads play an important role to reduce the energy consumption using particle swarm optimization algorithm, the cluster head sends its information along data packets to the heads with link. The simulation results show that the presented routing protocol can perform load balancing effectively and reduce the energy consumption of cluster heads.
Keywords: Wireless sensor networks, fuzzy logic, PSO, LEACH.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12838861 Influence of Ambiguity Cluster on Quality Improvement in Image Compression
Authors: Safaa Al-Ali, Ahmad Shahin, Fadi Chakik
Abstract:
Image coding based on clustering provides immediate access to targeted features of interest in a high quality decoded image. This approach is useful for intelligent devices, as well as for multimedia content-based description standards. The result of image clustering cannot be precise in some positions especially on pixels with edge information which produce ambiguity among the clusters. Even with a good enhancement operator based on PDE, the quality of the decoded image will highly depend on the clustering process. In this paper, we introduce an ambiguity cluster in image coding to represent pixels with vagueness properties. The presence of such cluster allows preserving some details inherent to edges as well for uncertain pixels. It will also be very useful during the decoding phase in which an anisotropic diffusion operator, such as Perona-Malik, enhances the quality of the restored image. This work also offers a comparative study to demonstrate the effectiveness of a fuzzy clustering technique in detecting the ambiguity cluster without losing lot of the essential image information. Several experiments have been carried out to demonstrate the usefulness of ambiguity concept in image compression. The coding results and the performance of the proposed algorithms are discussed in terms of the peak signal-tonoise ratio and the quantity of ambiguous pixels.Keywords: Ambiguity Cluster, Anisotropic Diffusion, Fuzzy Clustering, Image Compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15698860 A Balanced Cost Cluster-Heads Selection Algorithm for Wireless Sensor Networks
Authors: Ouadoudi Zytoune, Youssef Fakhri, Driss Aboutajdine
Abstract:
This paper focuses on reducing the power consumption of wireless sensor networks. Therefore, a communication protocol named LEACH (Low-Energy Adaptive Clustering Hierarchy) is modified. We extend LEACHs stochastic cluster-head selection algorithm by a modifying the probability of each node to become cluster-head based on its required energy to transmit to the sink. We present an efficient energy aware routing algorithm for the wireless sensor networks. Our contribution consists in rotation selection of clusterheads considering the remoteness of the nodes to the sink, and then, the network nodes residual energy. This choice allows a best distribution of the transmission energy in the network. The cluster-heads selection algorithm is completely decentralized. Simulation results show that the energy is significantly reduced compared with the previous clustering based routing algorithm for the sensor networks.Keywords: Wireless Sensor Networks, Energy efficiency, WirelessCommunications, Clustering-based algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26468859 Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco
Authors: Rachid Ait daoud, Abdellah Amine, Belaid Bouikhalene, Rachid Lbibb
Abstract:
Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, kmeans method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.Keywords: Customer value, LRFM model, Cluster analysis, Self-Organizing Maps method (SOM), K-means algorithm, loyalty.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 62538858 Using Pattern Search Methods for Minimizing Clustering Problems
Authors: Parvaneh Shabanzadeh, Malik Hj Abu Hassan, Leong Wah June, Maryam Mohagheghtabar
Abstract:
Clustering is one of an interesting data mining topics that can be applied in many fields. Recently, the problem of cluster analysis is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the cluster analysis problem based on nonsmooth optimization techniques is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. These methods do not explicitly use derivatives, an important feature that has not been addressed in previous studies. Results of numerical experiments are presented which demonstrate the effectiveness of the proposed method.Keywords: Clustering functions, Non-smooth Optimization, Nonconvex Optimization, Pattern Search Method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16408857 Determining Cluster Boundaries Using Particle Swarm Optimization
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17188856 Innovation Development of Food Market of Kazakhstan
Authors: G.B. Nurlikhina, G.N. Azhimetova, R. M. Ashimova
Abstract:
Currently, one of the main directions is developing of development based on the clustering of economic operations of Kazakhstan, providing for the organization and concentration of production capacity in one region or the most optimal system. In the modern economic literature clustering is regarded as one of the most effective tools to ensure competitive businesses, and improve their business itself.Keywords: A cluster, food market, innovation cluster.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22438855 UDCA: An Energy Efficient Clustering Algorithm for Wireless Sensor Network
Authors: Boregowda S.B., Hemanth Kumar A.R. Babu N.V, Puttamadappa C., And H.S Mruthyunjaya
Abstract:
In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.
Keywords: Clustering algorithms, Cluster head, Energy consumption, Sensor nodes, and Wireless sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23908854 Specific Frequency of Globular Clusters in Different Galaxy Types
Authors: Ahmed H. Abdullah, Pavel Kroupa
Abstract:
Globular clusters (GC) are important objects for tracing the early evolution of a galaxy. We study the correlation between the cluster population and the global properties of the host galaxy. We found that the correlation between cluster population (NGC) and the baryonic mass (Mb) of the host galaxy are best described as 10 −5.6038Mb. In order to understand the origin of the U -shape relation between the GC specific frequency (SN) and Mb (caused by the high value of SN for dwarfs galaxies and giant ellipticals and a minimum SN for intermediate mass galaxies≈ 1010M), we derive a theoretical model for the specific frequency (SNth). The theoretical model for SNth is based on the slope of the power-law embedded cluster mass function (β) and different time scale (Δt) of the forming galaxy. Our results show a good agreement between the observation and the model at a certain β and Δt. The model seems able to reproduce higher value of SNth of β = 1.5 at the midst formation time scale.Keywords: Galaxies, dwarf, globular cluster, specific frequency, formation time scale.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8008853 A Literature Review on the Effect of Industrial Clusters and the Absorptive Capacity on Innovation
Authors: Enrique Claver Cortés, Bartolomé Marco Lajara, Eduardo Sánchez García, Pedro Seva Larrosa, Encarnación Manresa Marhuenda, Lorena Ruiz Fernández, Esther Poveda Pareja
Abstract:
In recent decades, the analysis of the effects of clustering as an essential factor for the development of innovations and the competitiveness of enterprises has raised great interest in different areas. Nowadays, companies have access to almost all tangible and intangible resources located and/or developed in any country in the world. However, despite the obvious advantages that this situation entails for companies, their geographical location has shown itself, increasingly clearly, to be a fundamental factor that positively influences their innovative performance and competitiveness. Industrial clusters could represent a unique level of analysis, positioned between the individual company and the industry, which makes them an ideal unit of analysis to determine the effects derived from company membership of a cluster. Also, the absorptive capacity (hereinafter 'AC') can mediate the process of innovation development by companies located in a cluster. The transformation and exploitation of knowledge could have a mediating effect between knowledge acquisition and innovative performance. The main objective of this work is to determine the key factors that affect the degree of generation and use of knowledge from the environment by companies and, consequently, their innovative performance and competitiveness. The elements analyzed are the companies' membership of a cluster and the AC. To this end, 30 most relevant papers published on this subject in the "Web of Science" database have been reviewed. Our findings show that, within a cluster, the knowledge coming from the companies' environment can significantly influence their innovative performance and competitiveness, although in this relationship, the degree of access and exploitation of the companies to this knowledge plays a fundamental role, which depends on a series of elements both internal and external to the company.
Keywords: Absorptive capacity, clusters, innovation, knowledge.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8968852 Growing Self Organising Map Based Exploratory Analysis of Text Data
Authors: Sumith Matharage, Damminda Alahakoon
Abstract:
Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.
Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19968851 Fuzzy Clustering of Categorical Attributes and its Use in Analyzing Cultural Data
Authors: George E. Tsekouras, Dimitris Papageorgiou, Sotiris Kotsiantis, Christos Kalloniatis, Panagiotis Pintelas
Abstract:
We develop a three-step fuzzy logic-based algorithm for clustering categorical attributes, and we apply it to analyze cultural data. In the first step the algorithm employs an entropy-based clustering scheme, which initializes the cluster centers. In the second step we apply the fuzzy c-modes algorithm to obtain a fuzzy partition of the data set, and the third step introduces a novel cluster validity index, which decides the final number of clusters.
Keywords: Categorical data, cultural data, fuzzy logic clustering, fuzzy c-modes, cluster validity index.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17088850 Proposal to Increase the Efficiency, Reliability and Safety of the Centre of Data Collection Management and Their Evaluation Using Cluster Solutions
Authors: Martin Juhas, Bohuslava Juhasova, Igor Halenar, Andrej Elias
Abstract:
This article deals with the possibility of increasing efficiency, reliability and safety of the system for teledosimetric data collection management and their evaluation as a part of complex study for activity “Research of data collection, their measurement and evaluation with mobile and autonomous units” within project “Research of monitoring and evaluation of non-standard conditions in the area of nuclear power plants”. Possible weaknesses in existing system are identified. A study of available cluster solutions with possibility of their deploying to analysed system is presented
Keywords: Teledosimetric data, efficiency, reliability, safety, cluster solution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15588849 Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map
Authors: Anurag Sharma, Christian W. Omlin
Abstract:
Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.Keywords: cluster boundaries, clustering, code vectors, data mining, particle swarm optimization, self-organizing maps, U-matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19098848 Diversity Analysis of a Quinoa (Chenopodium quinoa Willd.) Germplasm during Two Seasons
Authors: M. Mhada, E. N. Jellen, S. E. Jacobsen, O. Benlhabib
Abstract:
The present work has been carried out to evaluate the diversity of a collection of 78 quinoa accessions developed through recurrent selection from Andean germplasm introduced to Morocco in the winter of 2000. Twenty-three quantitative and qualitative characters were used for the evaluation of genetic diversity and the relationship between the accessions, and also for the establishment of a core collection in Morocco. Important variation was found among the accessions in terms of plant morphology and growth behavior. Data analysis showed positive correlation of the plant height, the plant fresh and the dry weight with the grain yield, while days to flowering was found to be negatively correlated with grain yield. The first four PCs contributed 74.76% of the variability; the first PC showed significant variation with 42.86% of the total variation, PC2 with 15.37%, PC3 with 9.05% and PC4 contributed 7.49% of the total variation. Plant size, days to grain filling and days to maturity are correlated to the PC1; and seed size, inflorescence density and mildew resistance are correlated to the PC2. Hierarchical cluster analysis rearranged the 78 quinoa accessions into four main groups and ten sub-clusters. Clustering was found in associations with days to maturity and also with plant size and seed-size traits.
Keywords: Character association, Chenopodium quinoa, Diversity analysis, Morphotypic cluster, Multivariate analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25868847 The Effects of Yield and Yield Components of Some Quality Increase Applications on Razakı Grape Variety
Authors: Şehri Çınar, Aydın Akın
Abstract:
This study was conducted Razakı grape variety (Vitis vinifera L.) and its vine which was aged 19 was grown on 5 BB rootstock in a vegetation period of 2014 in Afyon province in Turkey. In this research, it was investigated whether the applications of Control (C), 1/3 Cluster Tip Reduction (1/3 CTR), Shoot Tip Reduction (STR), 1/3 CTR + STR, Boric Acid (BA), 1/3 CTR + BA, STR + BA, 1/3 CTR + STR + BA on yield and yield components of Razakı grape variety. The results were obtained as the highest fresh grape yield (7.74 kg/vine) with C application; as the highest cluster weight (244.62 g) with STR application; as the highest 100 berry weight (504.08 g) with C application; as the highest maturity index (36.89) with BA application; as the highest must yield (695.00 ml) with BA and (695.00 ml) with 1/3 CTR + STR + BA applications; as the highest intensity of L* color (46.93) with STR and (46.10) with 1/3 CTR + STR + BA applications; as the highest intensity of a* color (-5.37) with 1/3 CTR + STR and (-5.01) with STR, as the highest intensity of b* color (12.59) with STR application. The shoot tip reduction to increase cluster weight and boric acid application to increase maturity index of Razakı grape variety can be recommended.
Keywords: Razakı, 1/3 cluster tip reduction, shoot tip reduction, boric acid, yield and yield components.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35558846 The Role of Knowledge Management in Innovation: Spanish Evidence
Authors: María Jesús Luengo-Valderrey, Mónica Moso-Díez
Abstract:
In the knowledge-based economy, innovation is considered essential in order to achieve survival and growth in organizations. On the other hand, knowledge management is currently understood as one of the keys to innovation process. Both factors are generally admitted as generators of competitive advantage in organizations. Specifically, activities on R&D&I and those that generate internal knowledge have a positive influence in innovation results. This paper examines this effect and if it is similar or not is what we aimed to quantify in this paper. We focus on the impact that proportion of knowledge workers, the R&D&I investment, the amounts destined for ICTs and training for innovation have on the variation of tangible and intangibles returns for the sector of high and medium technology in Spain. To do this, we have performed an empirical analysis on the results of questionnaires about innovation in enterprises in Spain, collected by the National Statistics Institute. First, using clusters methodology, the behavior of these enterprises regarding knowledge management is identified. Then, using SEM methodology, we performed, for each cluster, the study about cause-effect relationships among constructs defined through variables, setting its type and quantification. The cluster analysis results in four groups in which cluster number 1 and 3 presents the best performance in innovation with differentiating nuances among them, while clusters 2 and 4 obtained divergent results to a similar innovative effort. However, the results of SEM analysis for each cluster show that, in all cases, knowledge workers are those that affect innovation performance most, regardless of the level of investment, and that there is a strong correlation between knowledge workers and investment in knowledge generation. The main findings reached is that Spanish high and medium technology companies improve their innovation performance investing in internal knowledge generation measures, specially, in terms of R&D activities, and underinvest in external ones. This, and the strong correlation between knowledge workers and the set of activities that promote the knowledge generation, should be taken into account by managers of companies, when making decisions about their investments for innovation, since they are key for improving their opportunities in the global market.
Keywords: High and medium technology sector, innovation, knowledge management, Spanish companies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21968845 Simultaneous Clustering and Feature Selection Method for Gene Expression Data
Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar
Abstract:
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.
Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19678844 Clustering Protein Sequences with Tailored General Regression Model Technique
Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma
Abstract:
Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.Keywords: Clustering, General Regression Model, Protein Sequences, Similarity Measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15678843 A Spanning Tree for Enhanced Cluster Based Routing in Wireless Sensor Network
Authors: M. Saravanan, M. Madheswaran
Abstract:
Wireless Sensor Network (WSN) clustering architecture enables features like network scalability, communication overhead reduction, and fault tolerance. After clustering, aggregated data is transferred to data sink and reducing unnecessary, redundant data transfer. It reduces nodes transmitting, and so saves energy consumption. Also, it allows scalability for many nodes, reduces communication overhead, and allows efficient use of WSN resources. Clustering based routing methods manage network energy consumption efficiently. Building spanning trees for data collection rooted at a sink node is a fundamental data aggregation method in sensor networks. The problem of determining Cluster Head (CH) optimal number is an NP-Hard problem. In this paper, we combine cluster based routing features for cluster formation and CH selection and use Minimum Spanning Tree (MST) for intra-cluster communication. The proposed method is based on optimizing MST using Simulated Annealing (SA). In this work, normalized values of mobility, delay, and remaining energy are considered for finding optimal MST. Simulation results demonstrate the effectiveness of the proposed method in improving the packet delivery ratio and reducing the end to end delay.
Keywords: Wireless sensor network, clustering, minimum spanning tree, genetic algorithm, low energy adaptive clustering hierarchy, simulated annealing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17868842 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia
Authors: Carol Anne Hargreaves
Abstract:
A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.
Keywords: Machine learning, stock market trading, logistic principal component analysis, automated stock investment system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10988841 Energy and Distance Based Clustering: An Energy Efficient Clustering Method for Wireless Sensor Networks
Authors: Mehdi Saeidmanesh, Mojtaba Hajimohammadi, Ali Movaghar
Abstract:
In this paper, we propose an energy efficient cluster based communication protocol for wireless sensor network. Our protocol considers both the residual energy of sensor nodes and the distance of each node from the BS when selecting cluster-head. This protocol can successfully prolong the network-s lifetime by 1) reducing the total energy dissipation on the network and 2) evenly distributing energy consumption over all sensor nodes. In this protocol, the nodes with more energy and less distance from the BS are probable to be selected as cluster-head. Simulation results with MATLAB show that proposed protocol could increase the lifetime of network more than 94% for first node die (FND), and more than 6% for the half of the nodes alive (HNA) factor as compared with conventional protocols.Keywords: Clustering methods, energy efficiency, routing protocol, wireless sensor networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21578840 A New Algorithm for Cluster Initialization
Authors: Moth'd Belal. Al-Daoud
Abstract:
Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.
Keywords: clustering, k-means, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21038839 Cluster Analysis of Retailers’ Benefits from Their Cooperation with Manufacturers: Business Models Perspective
Authors: M. K. Witek-Hajduk, T. M. Napiórkowski
Abstract:
A number of studies discussed the topic of benefits of retailers-manufacturers cooperation and coopetition. However, there are only few publications focused on the benefits of cooperation and coopetition between retailers and their suppliers of durable consumer goods; especially in the context of business model of cooperating partners. This paper aims to provide a clustering approach to segment retailers selling consumer durables according to the benefits they obtain from their cooperation with key manufacturers and differentiate the said retailers’ in term of the business models of cooperating partners. For the purpose of the study, a survey (with a CATI method) collected data on 603 consumer durables retailers present on the Polish market. Retailers are clustered both, with hierarchical and non-hierarchical methods. Five distinctive groups of consumer durables’ retailers are (based on the studied benefits) identified using the two-stage clustering approach. The clusters are then characterized with a set of exogenous variables, key of which are business models employed by the retailer and its partnering key manufacturer. The paper finds that the a combination of a medium sized retailer classified as an Integrator with a chiefly domestic capital and a manufacturer categorized as a Market Player will yield the highest benefits. On the other side of the spectrum is medium sized Distributor retailer with solely domestic capital – in this case, the business model of the cooperating manufactrer appears to be irreleveant. This paper is the one of the first empirical study using cluster analysis on primary data that defines the types of cooperation between consumer durables’ retailers and manufacturers – their key suppliers. The analysis integrates a perspective of both retailers’ and manufacturers’ business models and matches them with individual and joint benefits.
Keywords: Business model, cooperation, cluster analysis, retailer-manufacturer relationships.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11218838 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu
Abstract:
Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.
Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance empirical formula, typical SQL query tasks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8378837 Forest Growth Simulation: Tropical Rain Forest Stand Table Projection
Authors: Yasmin Yahya, Roslan Ismail, Samreth Vanna, Khorn Saret
Abstract:
The study on the tree growth for four species groups of commercial timber in Koh Kong province, Cambodia-s tropical rainforest is described. The simulation for these four groups had been successfully developed in the 5-year interval through year-60. Data were obtained from twenty permanent sample plots in the duration of thirteen years. The aim for this study was to develop stand table simulation system of tree growth by the species group. There were five steps involved in the development of the tree growth simulation: aggregate the tree species into meaningful groups by using cluster analysis; allocate the trees in the diameter classes by the species group; observe the diameter movement of the species group. The diameter growth rate, mortality rate and recruitment rate were calculated by using some mathematical formula. Simulation equation had been created by combining those parameters. Result showed the dissimilarity of the diameter growth among species groups.
Keywords: cluster analysis, diameter growth, simulation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22138836 A Construction Management Tool: Determining a Project Schedule Typical Behaviors Using Cluster Analysis
Authors: Natalia Rudeli, Elisabeth Viles, Adrian Santilli
Abstract:
Delays in the construction industry are a global phenomenon. Many construction projects experience extensive delays exceeding the initially estimated completion time. The main purpose of this study is to identify construction projects typical behaviors in order to develop a prognosis and management tool. Being able to know a construction projects schedule tendency will enable evidence-based decision-making to allow resolutions to be made before delays occur. This study presents an innovative approach that uses Cluster Analysis Method to support predictions during Earned Value Analyses. A clustering analysis was used to predict future scheduling, Earned Value Management (EVM), and Earned Schedule (ES) principal Indexes behaviors in construction projects. The analysis was made using a database with 90 different construction projects. It was validated with additional data extracted from literature and with another 15 contrasting projects. For all projects, planned and executed schedules were collected and the EVM and ES principal indexes were calculated. A complete linkage classification method was used. In this way, the cluster analysis made considers that the distance (or similarity) between two clusters must be measured by its most disparate elements, i.e. that the distance is given by the maximum span among its components. Finally, through the use of EVM and ES Indexes and Tukey and Fisher Pairwise Comparisons, the statistical dissimilarity was verified and four clusters were obtained. It can be said that construction projects show an average delay of 35% of its planned completion time. Furthermore, four typical behaviors were found and for each of the obtained clusters, the interim milestones and the necessary rhythms of construction were identified. In general, detected typical behaviors are: (1) Projects that perform a 5% of work advance in the first two tenths and maintain a constant rhythm until completion (greater than 10% for each remaining tenth), being able to finish on the initially estimated time. (2) Projects that start with an adequate construction rate but suffer minor delays culminating with a total delay of almost 27% of the planned time. (3) Projects which start with a performance below the planned rate and end up with an average delay of 64%, and (4) projects that begin with a poor performance, suffer great delays and end up with an average delay of a 120% of the planned completion time. The obtained clusters compose a tool to identify the behavior of new construction projects by comparing their current work performance to the validated database, thus allowing the correction of initial estimations towards more accurate completion schedules.
Keywords: Cluster analysis, construction management, earned value, schedule.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12008835 Networks in the Tourism Sector in Brazil: Proposal of a Management Model Applied to Tourism Clusters
Authors: Gysele Lima Ricci, Jose Miguel Rodriguez Anton
Abstract:
Companies in the tourism sector need to achieve competitive advantages for their survival in the market. In this way, the models based on association, cooperation, complementarity, distribution, exchange and mutual assistance arise as a possibility of organizational development, taking as reference the concept of networks. Many companies seek to partner in local networks as clusters to act together and associate. The main objective of the present research is to identify the specificities of management and the practices of cooperation in the tourist destination of São Paulo - Brazil, and to propose a new management model with possible cluster of tourism. The empirical analysis was carried out in three phases. As a first phase, a research was made by the companies, associations and tourism organizations existing in São Paulo, analyzing the characteristics of their business. In the second phase, the management specificities and cooperation practice used in the tourist destination. And in the third phase, identifying the possible strengths and weaknesses that potential or potential tourist cluster could have, proposing the development of the management model of the same adapted to the needs of the companies, associations and organizations. As a main result, it has been identified that companies, associations and organizations could be looking for synergies with each other and collaborate through a Hiperred organizational structure, in which they share their knowledge, try to make the most of the collaboration and to benefit from three concepts: flexibility, learning and collaboration. Finally, it is concluded that, the proposed tourism cluster management model is viable for the development of tourism destinations because it makes it possible to strategically address agents which are responsible for public policies, as well as public and private companies and organizations in their strategies competitiveness and cooperation.Keywords: Cluster, management model, networks, tourism sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10108834 A Design for Customer Preferences Model by Cluster Analysis of Geometric Features and Customer Preferences
Authors: Yuan-Jye Tseng, Ching-Yen Chen
Abstract:
In the design cycle, a main design task is to determine the external shape of the product. The external shape of a product is one of the key factors that can affect the customers’ preferences linking to the motivation to buy the product, especially in the case of a consumer electronic product such as a mobile phone. The relationship between the external shape and the customer preferences needs to be studied to enhance the customer’s purchase desire and action. In this research, a design for customer preferences model is developed for investigating the relationships between the external shape and the customer preferences of a product. In the first stage, the names of the geometric features are collected and evaluated from the data of the specified internet web pages using the developed text miner. The key geometric features can be determined if the number of occurrence on the web pages is relatively high. For each key geometric feature, the numerical values are explored using the text miner to collect the internet data from the web pages. In the second stage, a cluster analysis model is developed to evaluate the numerical values of the key geometric features to divide the external shapes into several groups. Several design suggestion cases can be proposed, for example, large model, mid-size model, and mini model, for designing a mobile phone. A customer preference index is developed by evaluating the numerical data of each of the key geometric features of the design suggestion cases. The design suggestion case with the top ranking of the customer preference index can be selected as the final design of the product. In this paper, an example product of a notebook computer is illustrated. It shows that the external shape of a product can be used to drive customer preferences. The presented design for customer preferences model is useful for determining a suitable external shape of the product to increase customer preferences.
Keywords: Cluster analysis, customer preferences, design evaluation, design for customer preferences, product design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 776