A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data
Authors: M. Pandi, K. Premalatha
Abstract:
A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.
Keywords: DNA, Microarray, genomics, Cuckoo Search, Differential Evolution, Gene expression data, Clustering.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1112057
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488References:
[1] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Array”, In Proc. of the Natl. Acad. Sci. U.S.A., Vol. 96, No. 12, pp. 6745-6750, 1999.
[2] C. Ding, “Analysis of Gene Expression Profiles: Class Discovery and Leaf Ordering”, In Proc. of the Int. Conf. Comput. Mol. Biol. (RECOMB), Berlin, Germany, pp. 27-136, 2002.
[3] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review”, ACM Comput. Surv., Vol. 31, pp. 264-323, 1999.
[4] A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering Gene Expression Patterns”, J. Comput. Biol., Vol. 6, No 3, pp. 281-297, 1999.
[5] X.S. Yang, & S. Deb, “Cuckoo search via Levy flights” Proc. of World Congress on Nat. & Biologically Inspired Comput., pp. 210 – 214, 2009.
[6] F. Fazel, L. Ganming, L. Ziying, “Evaluation and optimization of clustering in gene expression data analysis”, BMC Bioinf., Vol. 20, No. 10, pp. 1535-1545, 2004.
[7] L. Nazareth, P. Tseng, “Gilding the Lily: A Variant of the Nelder-Mead Algorithm Based on Golden-Section Search”, Comput. Optim. Appl, Vol. 22, no. 1, pp. 133–144, 2002.
[8] D. G. Vito, G. Raffaele, L. B. Giosu, R. Alessandra, and S. Davide, “GenClust: A genetic algorithm for clustering gene expression data”, BMC Bioinf., Vol. 280, No.6, pp. 1-11, 2005.
[9] P.C.H. Ma, K.C.C. Chan, X. Yao, and D.K.Y. Chiu, “An evolutionary clustering algorithm for gene expression microarray data analysis”, IEEE Trans. Evol. Comput., Vol. 10, No. 3, pp. 296-314, 2006.
[10] R. Kustra, “A factor analysis model for functional genomics”, BMC Bioinf., Vol. 216, No. 7, pp. 1-13, 2006.
[11] G. Kerr, H. J. Ruskin, M. Crane, and P. Doolan, “Techniques for clustering gene expression data”, Comput. Biol. Med., Vol. 38, pp. 283–293, 2007.
[12] D. Zhihua, W. Yiwei, J. Zhen, “PK-means: A new algorithm for gene clustering”, Comput. Biol. Chem., Vol. 32, pp.243-247, 2008.
[13] L. Wei, B. Wang, G. Jarka, M. Elaine, and Z. Jian, “A novel methodology for finding the regulation on gene expression data”, Proc. Nat. Sci., Vol. 19, pp. 267-272, 2009.
[14] Rui Xu and D.C. Wunsch,” Clustering Algorithms in Biomedical Research: A Review”, IEEE Rev. Biomed. Eng., Vol. 3, pp. 120 – 154, 2010.
[15] Sajid Nagi, D.K. Bhattacharyya, and J.K. Kalita, “Subspace Clustering in Gene Expression Data Analysis: A Survey, in Machine Intelligence: Recent Advances”, Narosa Publ., Delhi, pp. 211-219, 2011.
[16] J. Jacinth Salome and R.M. Suresh, “Efficient Clustering for Gene Expression Data”, Int. J. Comput. Appl., Vol. 47, pp. 30-35, 2012.
[17] P. A. Jaskowiak and R.J.G.B Campello, “Comparing correlation coefficients as dissimilarity measures for cancer classification in gene expression data”, Proc. Braz. Symp. Bioinf. Brasilia. Braz, pp. 1-8, 2011.
[18] N. Arulanand, S. Subramanian and K. Premalatha “An Enhanced Cuckoo Search for Optimization of Bloom Filter in Spam Filtering,” Global J. comp. Scie. Tech., vol. 12, no. 1, Jan. 2012.
[19] N. Arulanand, S. Subramanian and K. Premalatha “A Comparison study of cuckoo-bat search for Optimization of Bloom Filter in Spam Filtering,” Int. J. Bio-Inspired Comput., vol. 4, no. 2, pp.89-99, June 2012.
[20] X.S. Yang, and S. Deb, “Engineering optimisation by Cuckoo search” Int. J. Math. Modeil. Numer. optim., vol. 1, no. 4, pp. 330-343, Dec. 2010.
[21] P. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and Futcher, (1998) “Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization”, Mol. Biol. Cell, Vol. 9, pp - 3273-3297, 1998.
[22] P. Gray, W.E. Hart, L. Painton, C. Phillips, M. Trahan, and J. Wagner, “A Survey of Global Optimization Methods”, Tech. Rep., Sandia Nat. Lab, 2000.
[23] L. Hubert, P. Arabie, P. “Comparing partitions” Journal of Classification, 2:193–218, 1985.
[24] Kuncheva, I. Ludmila, Hadjitodorov, T. Stefan, “Using Diversity in Cluster Ensembles” IEEE SMC Int. Conf. on Sys., pp. 345-353, 2004.