BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Mohamed A. Mahfouz; M. A. Ismail

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.

Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1081433

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1906

References:

[1] G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering Analysis of Gene Microarray Data," Proc. Natural Academy of Sciences US, pp. 12079-12084, 2000.
[2] C.Tang, L.Zhang, I.Zhang, and M.Ramanathan, "Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis," Proc. Second IEEE Int-l Symp. Bioinformatics and Bioeng., pp. 41-48, 2001.
[3] Y. Cheng and G. Church, "Biclustering of expression data," Proc. Eighth Int-l Conf. Intelligent Systems for Molecular Biology(ISMB -00), pp. 93-103, 2000.
[4] J. Yang, W. Wang, H. Wang, and P. Yu, "Enhanced Biclustering on Expression Data," Proc. Third IEEE Conf. Bioinformatics and Bioeng.,pp. 321-327, 2003.
[5] T.M. Murali and S. Kasif, "Extracting Conserved Gene Expression Motifs from Gene Expression Data," Proc. Pacific Symp. Biocomputing,vol. 8, pp. 77-88, 2003.
[6] L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data," technical report, Stanford Univ., 2000.
[7] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem," Proc. Sixth Int-l Conf. Computational Biology (RECOMB -02), pp. 49-57, 2002.
[8] J. Ihmels, S. Bergmann, and N. Brkai, "Defining Transaction Modules using large scale gene expression data," Bioinformatics,Vol.20,No.13,pp.1993-2003, 2004.
[9] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically Significant Biclusters in Gene Expression Data," Bioinformatics, vol. 18, pp. S136-S144, 2002.
[10] A. Prelic, S. Bleuler, P. Zimmermann, A.Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E.Zitzler, "A Systematic comparison and evaluation of biclustering methods for gene expression data," Bioinformatics, 22:1122-1129, 2006.
[11] H. Sharara M.A.Ismail, "╬▒CORR: A novel algorithm for clustering gene expression data," Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference, pp. 974-981, 2007.
[12] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High Dimensional Space," Proc. Third IEEE Int-l Conf. Data Mining, pp. 187- 194, 2003.
[13] LCM ver2 Available http://research.nii.ac.jp/~uno/codes-j.html.
[14] G. Liu,Jinyan, L. Kelvin and L. Wong, "Distance Based Subspace Clustering with Flexible Dimension Partitioning," IEEE, pp. 1250-1254, 2007.
[15] J. Pei, A. K. Tung, and J. Han., "Fault-tolerant frequent pattern mining: Problems and challenges,"Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.
[16] M. P. Wand, "Data-Based Choice of Histogram Bin Width," The American Statistician, vol. 51, 1996, pp. 59-64.
[17] Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE TRANS. Computational Biology And Bioinformatics, vol. 1, 2004.
[18] Yeast and Human Dataset. Available http://arep.med.harvard.edu/network discovery.
[19] SyntheticDatasets. Available http://www.tik.ee.ethz.ch/sop/bimax/SupplementMatrials,Biclustering.ht ml.
[20] Y. Okada, W. Fujibuchi and P. Horton, "Module Discovery in Gene Expression Data Using Closed Itemset Mining Algorithm," IPSG transactions in bioinformatics, vol.48, pp39-48, 2007.
[21] A. B. Tchagang and A. H. Tewfik, "DNAMicroarray Data Analysis: A Novel Biclustering Algorithm Approach," EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1-12.