Search results for: data selection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8166

Search results for: data selection

8136 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
8135 Gene Selection Guided by Feature Interdependence

Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract:

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2147
8134 Correlation-based Feature Selection using Ant Colony Optimization

Authors: M. Sadeghzadeh, M. Teshnehlab

Abstract:

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Keywords: Ant colony optimization, Classification, Datamining, Feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2422
8133 Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar

Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1969
8132 A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data

Authors: Rameswar Debnath, Haruhisa Takahashi

Abstract:

An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.

Keywords: support vector machine, generalization error-bound, feature selection, evolutionary algorithm, microarray data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1539
8131 Location Based Clustering in Wireless Sensor Networks

Authors: Ashok Kumar, Narottam Chand, Vinod Kumar

Abstract:

Due to the limited energy resources, energy efficient operation of sensor node is a key issue in wireless sensor networks. Clustering is an effective method to prolong the lifetime of energy constrained wireless sensor network. However, clustering in wireless sensor network faces several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we propose a protocol supporting an energy efficient clustering, cluster head selection/rotation and data routing method to prolong the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection/rotation and data routing.

Keywords: Wireless sensor networks, clustering, energy efficient, localization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2686
8130 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1832
8129 Selection and Design of an Axial Flow Fan

Authors: D. Almazo, C. Rodríguez, M. Toledo

Abstract:

This work presents a methodology for the selection and design of propeller oriented to the experimental verification of theoretical results. The problem of propeller selection and design usually present itself in the following manner: a certain air volume and static pressure are required for a certain system. Once the necessity of fan design on a theoretical basis has been recognized, it is possible to determinate the dimensions for a fan unit so that it will perform in accordance with a certain set of specifications. The same procedures in this work then can be applied in other propeller selection.

Keywords: airfoil, axial flow, blade, fan, hub, mathematical algorithm, propeller design, simulation, wheel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13603
8128 A Relational Case-Based Reasoning Framework for Project Delivery System Selection

Authors: Yang Cui, Yong Qiang Chen

Abstract:

An appropriate project delivery system (PDS) is crucial to the success of a construction projects. Case-based Reasoning (CBR) is a useful support for PDS selection. However, the traditional CBR approach represents cases as attribute-value vectors without taking relations among attributes into consideration, and could not calculate the similarity when the structures of cases are not strictly same. Therefore, this paper solves this problem by adopting the Relational Case-based Reasoning (RCBR) approach for PDS selection, considering both the structural similarity and feature similarity. To develop the feature terms of the construction projects, the criteria and factors governing PDS selection process are first identified. Then feature terms for the construction projects are developed. Finally, the mechanism of similarity calculation and a case study indicate how RCBR works for PDS selection. The adoption of RCBR in PDS selection expands the scope of application of traditional CBR method and improves the accuracy of the PDS selection system.

Keywords: Relational Cased-based Reasoning, Case-based Reasoning, Project delivery system, Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1997
8127 Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers

Authors: Emel Kızılkaya Aydogan, Mihrimah Ozmen, Yılmaz Delice

Abstract:

In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.

Keywords: Churn prediction, data mining, decision-theoretic rough set, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1764
8126 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2437
8125 Reducing SAGE Data Using Genetic Algorithms

Authors: Cheng-Hong Yang, Tsung-Mu Shih, Li-Yeh Chuang

Abstract:

Serial Analysis of Gene Expression is a powerful quantification technique for generating cell or tissue gene expression data. The profile of the gene expression of cell or tissue in several different states is difficult for biologists to analyze because of the large number of genes typically involved. However, feature selection in machine learning can successfully reduce this problem. The method allows reducing the features (genes) in specific SAGE data, and determines only relevant genes. In this study, we used a genetic algorithm to implement feature selection, and evaluate the classification accuracy of the selected features with the K-nearest neighbor method. In order to validate the proposed method, we used two SAGE data sets for testing. The results of this study conclusively prove that the number of features of the original SAGE data set can be significantly reduced and higher classification accuracy can be achieved.

Keywords: Serial Analysis of Gene Expression, Feature selection, Genetic Algorithm, K-nearest neighbor method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1613
8124 Welding Process Selection for Storage Tank by Integrated Data Envelopment Analysis and Fuzzy Credibility Constrained Programming Approach

Authors: Rahmad Wisnu Wardana, Eakachai Warinsiriruk, Sutep Joy-A-Ka

Abstract:

Selecting the most suitable welding process usually depends on experiences or common application in similar companies. However, this approach generally ignores many criteria that can be affecting the suitable welding process selection. Therefore, knowledge automation through knowledge-based systems will significantly improve the decision-making process. The aims of this research propose integrated data envelopment analysis (DEA) and fuzzy credibility constrained programming approach for identifying the best welding process for stainless steel storage tank in the food and beverage industry. The proposed approach uses fuzzy concept and credibility measure to deal with uncertain data from experts' judgment. Furthermore, 12 parameters are used to determine the most appropriate welding processes among six competitive welding processes.

Keywords: Welding process selection, data envelopment analysis, fuzzy credibility constrained programming, storage tank.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 802
8123 A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2071
8122 Novel Hybrid Method for Gene Selection and Cancer Prediction

Authors: Liping Jing, Michael K. Ng, Tieyong Zeng

Abstract:

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2045
8121 A Fuzzy Decision Making Approach for Supplier Selection in Healthcare Industry

Authors: Zeynep Sener, Mehtap Dursun

Abstract:

Supplier evaluation and selection is one of the most important components of an effective supply chain management system. Due to the expanding competition in healthcare, selecting the right medical device suppliers offers great potential for increasing quality while decreasing costs. This paper proposes a fuzzy decision making approach for medical supplier selection. A real-world medical device supplier selection problem is presented to illustrate the application of the proposed decision methodology.

Keywords: Fuzzy decision making, fuzzy multiple objective programming, medical supply chain, supplier selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2674
8120 Binary Programming for Manufacturing Material and Manufacturing Process Selection Using Genetic Algorithms

Authors: Saleem Z. Ramadan

Abstract:

The material selection problem is concerned with the determination of the right material for a certain product to optimize certain performance indices in that product such as mass, energy density, and power-to-weight ratio. This paper is concerned about optimizing the selection of the manufacturing process along with the material used in the product under performance indices and availability constraints. In this paper, the material selection problem is formulated using binary programming and solved by genetic algorithm. The objective function of the model is to minimize the total manufacturing cost under performance indices and material and manufacturing process availability constraints.

Keywords: Optimization, Material selection, Process selection, Genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600
8119 Aircraft Selection Using Multiple Criteria Decision Making Analysis Method with Different Data Normalization Techniques

Authors: C. Ardil

Abstract:

This paper presents an original application of multiple criteria decision making analysis theory to the evaluation of aircraft selection problem. The selection of an optimal, efficient and reliable fleet, network and operations planning policy is one of the most important factors in aircraft selection problem. Given that decision making in aircraft selection involves the consideration of a number of opposite criteria and possible solutions, such a selection can be considered as a multiple criteria decision making analysis problem. This study presents a new integrated approach to decision making by considering the multiple criteria utility theory and the maximal regret minimization theory methods as well as aircraft technical, economical, and environmental aspects. Multiple criteria decision making analysis method uses different normalization techniques to allow criteria to be aggregated with qualitative and quantitative data of the decision problem. Therefore, selecting a suitable normalization technique for the model is also a challenge to provide data aggregation for the aircraft selection problem. To compare the impact of different normalization techniques on the decision problem, the vector, linear (sum), linear (max), and linear (max-min) data normalization techniques were identified to evaluate aircraft selection problem. As a logical implication of the proposed approach, it enhances the decision making process through enabling the decision maker to: (i) use higher level knowledge regarding the selection of criteria weights and the proposed technique, (ii) estimate the ranking of an alternative, under different data normalization techniques and integrated criteria weights after a posteriori analysis of the final rankings of alternatives. A set of commercial passenger aircraft were considered in order to illustrate the proposed approach. The obtained results of the proposed approach were compared using Spearman's rho tests. An analysis of the final rank stability with respect to the changes in criteria weights was also performed so as to assess the sensitivity of the alternative rankings obtained by the application of different data normalization techniques and the proposed approach.

Keywords: Normalization Techniques, Aircraft Selection, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 595
8118 Self Watermarking based on Visual Cryptography

Authors: Mahmoud A. Hassan, Mohammed A. Khalili

Abstract:

We are proposing a simple watermarking method based on visual cryptography. The method is based on selection of specific pixels from the original image instead of random selection of pixels as per Hwang [1] paper. Verification information is generated which will be used to verify the ownership of the image without the need to embed the watermark pattern into the original digital data. Experimental results show the proposed method can recover the watermark pattern from the marked data even if some changes are made to the original digital data.

Keywords: Watermarking, visual cryptography, visualthreshold.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743
8117 Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier

Authors: Khin May Win, Nan Sai Moon Kham

Abstract:

Identification of cancer genes that might anticipate the clinical behaviors from different types of cancer disease is challenging due to the huge number of genes and small number of patients samples. The new method is being proposed based on supervised learning of classification like support vector machines (SVMs).A new solution is described by the introduction of the Maximized Margin (MM) in the subset criterion, which permits to get near the least generalization error rate. In class prediction problem, gene selection is essential to improve the accuracy and to identify genes for cancer disease. The performance of the new method was evaluated with real-world data experiment. It can give the better accuracy for classification.

Keywords: Microarray data, feature selection, recursive featureelimination, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
8116 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: Band selection, fuzzy C-means, K-means, hyperspectral image.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1818
8115 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: Distributed information retrieval, relevance, server selection, collection selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381
8114 A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Authors: E. N. Sathishkumar, K. Thangavel, T. Chandrasekhar

Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Keywords: Clustering, Gene Selection, K-Mean-Quick Reduct, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2299
8113 Optimizing Spatial Trend Detection By Artificial Immune Systems

Authors: M. Derakhshanfar, B. Minaei-Bidgoli

Abstract:

Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.

Keywords: Spatial Data Mining, Spatial Trend Detection, Heuristic Methods, Artificial Immune System, Clonal Selection Algorithm (CSA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2050
8112 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: Classification algorithms; data mining; tourism; knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2550
8111 Using Degree of Adaptive (DOA) Model for Partner Selection in Supply Chain

Authors: Habibollah Javanmard

Abstract:

In order to reduce cost, increase quality, and for timely supplying production systems has considerably taken the advantages of supply chain management and these advantages are also competitive. Selection of appropriate supplier has an important role in improvement and efficiency of systems. The models of supplier selection which have already been used by researchers have considered selection one or more suppliers from potential suppliers but in this paper selecting one supplier as partner from one supplier that have minimum one period supplying to buyer is considered. This paper presents a conceptual model for partner selection and application of Degree of Adoptive (DOA) model for final selection. The attributes weight in this model is prepared through AHP model. After making the descriptive model, determining the attributes and measuring the parameters of the adaptive is examined in an auto industry of Iran(Zagross Khodro co.) and results are presented.

Keywords: Partnership, Degree of Adaptive, AHP, SupplyChain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677
8110 Generational PipeLined Genetic Algorithm (PLGA)using Stochastic Selection

Authors: Malay K. Pakhira, Rajat K. De

Abstract:

In this paper, a pipelined version of genetic algorithm, called PLGA, and a corresponding hardware platform are described. The basic operations of conventional GA (CGA) are made pipelined using an appropriate selection scheme. The selection operator, used here, is stochastic in nature and is called SA-selection. This helps maintaining the basic generational nature of the proposed pipelined GA (PLGA). A number of benchmark problems are used to compare the performances of conventional roulette-wheel selection and the SA-selection. These include unimodal and multimodal functions with dimensionality varying from very small to very large. It is seen that the SA-selection scheme is giving comparable performances with respect to the classical roulette-wheel selection scheme, for all the instances, when quality of solutions and rate of convergence are considered. The speedups obtained by PLGA for different benchmarks are found to be significant. It is shown that a complete hardware pipeline can be developed using the proposed scheme, if parallel evaluation of the fitness expression is possible. In this connection a low-cost but very fast hardware evaluation unit is described. Results of simulation experiments show that in a pipelined hardware environment, PLGA will be much faster than CGA. In terms of efficiency, PLGA is found to outperform parallel GA (PGA) also.

Keywords: Hardware evaluation, Hardware pipeline, Optimization, Pipelined genetic algorithm, SA-selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
8109 Project Selection by Using a Fuzzy TOPSIS Technique

Authors: M. Salehi, R. Tavakkoli-Moghaddam

Abstract:

Selection of a project among a set of possible alternatives is a difficult task that the decision maker (DM) has to face. In this paper, by using a fuzzy TOPSIS technique we propose a new method for a project selection problem. After reviewing four common methods of comparing investment alternatives (net present value, rate of return, benefit cost analysis and payback period) we use them as criteria in a TOPSIS technique. First we calculate the weight of each criterion by a pairwise comparison and then we utilize the improved TOPSIS assessment for the project selection.

Keywords: Fuzzy Theory, Pairwise Comparison, ProjectSelection, TOPSIS Technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2642
8108 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: Feature selection, multi-objective evolutionary computation, unsupervised classification, behavior assessment system for children.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
8107 An Optimal Feature Subset Selection for Leaf Analysis

Authors: N. Valliammal, S.N. Geethalakshmi

Abstract:

This paper describes an optimal approach for feature subset selection to classify the leaves based on Genetic Algorithm (GA) and Kernel Based Principle Component Analysis (KPCA). Due to high complexity in the selection of the optimal features, the classification has become a critical task to analyse the leaf image data. Initially the shape, texture and colour features are extracted from the leaf images. These extracted features are optimized through the separate functioning of GA and KPCA. This approach performs an intersection operation over the subsets obtained from the optimization process. Finally, the most common matching subset is forwarded to train the Support Vector Machine (SVM). Our experimental results successfully prove that the application of GA and KPCA for feature subset selection using SVM as a classifier is computationally effective and improves the accuracy of the classifier.

Keywords: Optimization, Feature extraction, Feature subset, Classification, GA, KPCA, SVM and Computation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2246