Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4

Publications

4 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

Authors: K. Thangavel, R. Rathipriya

Abstract:

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1748
3 Simultaneous Clustering and Feature Selection Method for Gene Expression Data

Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar

Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
2 A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Authors: E. N. Sathishkumar, K. Thangavel, T. Chandrasekhar

Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Keywords: Clustering, Gene Selection, K-Mean-Quick Reduct, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1907
1 Pectoral Muscles Suppression in Digital Mammograms Using Hybridization of Soft Computing Methods

Authors: I. Laurence Aroquiaraj, K. Thangavel

Abstract:

Breast region segmentation is an essential prerequisite in computerized analysis of mammograms. It aims at separating the breast tissue from the background of the mammogram and it includes two independent segmentations. The first segments the background region which usually contains annotations, labels and frames from the whole breast region, while the second removes the pectoral muscle portion (present in Medio Lateral Oblique (MLO) views) from the rest of the breast tissue. In this paper we propose hybridization of Connected Component Labeling (CCL), Fuzzy, and Straight line methods. Our proposed methods worked good for separating pectoral region. After removal pectoral muscle from the mammogram, further processing is confined to the breast region alone. To demonstrate the validity of our segmentation algorithm, it is extensively tested using over 322 mammographic images from the Mammographic Image Analysis Society (MIAS) database. The segmentation results were evaluated using a Mean Absolute Error (MAE), Hausdroff Distance (HD), Probabilistic Rand Index (PRI), Local Consistency Error (LCE) and Tanimoto Coefficient (TC). The hybridization of fuzzy with straight line method is given more than 96% of the curve segmentations to be adequate or better. In addition a comparison with similar approaches from the state of the art has been given, obtaining slightly improved results. Experimental results demonstrate the effectiveness of the proposed approach.

Keywords: X-ray Mammography, CCL, Fuzzy, Straight line.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1385