MIBiClus: Mutual Information based Biclustering Algorithm

Neelima Gupta; Seema Aggarwal

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

MIBiClus: Mutual Information based Biclustering Algorithm

Authors: Neelima Gupta, Seema Aggarwal

Abstract:

Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.

Keywords: Biclustering, mutual information.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1078851

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571

References:

[1] Kevin Y. Yip, David W. Cheung, and Micheal K. Ng:HARP: A Practical Projected Clustering Algorithm. IEEE Transactions On Knowledge And Data Engineering, volume 16, IEEE Computer Society, 2004.
[2] Charu C. Aggarwal, Cecilia Procopiuc, Joel L. Wolf, Philip S. Yu, and Jong Soo Park: Fast Algorithms For Projected Clustering. ACM SIGMOD Int-l Conf. Management of Data, pages 61-72, (1999).
[3] Charu C. Aggarwal and Philip S. Yu:Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD Int-l Conf. Management of Data, pages 70-81, 2000.
[4] Y. Cheng and G. M. Church: Biclustering Of Gene Expression Data. System Molecular Biology, volume 8, pages 1-93. System Molecular Biology, 2000.
[5] H. Wang, W. Wang, J. Yang, and P. S. Yu:Clustering by Pattern Similarity in Large Data Sets. Bull, Math. Biol. 46, pages 515 -527. ACM Press, 1984.
[6] Gad Getz, Erel Levine, and Eytan Domany: Coupled Two-Way Clustering Analysis Of Gene Microarray Data. Cell Biology, volume 97, pages 12079-12084. PNAS, 2000.
[7] Yuval Kluger, Rones Basri, Joseph T. Chang, and Mark Gerstein: Spectral Biclustering Of Microarray Data: Coclustering Genes And Conditions. Genome Research, Cold Spring Harbor Laboratory Press, 2003.
[8] J. Ihmels, G. Friedlander, S. Bergmann, Y. Ziv O. Sarig, and N. Barkai: Revealing Modular Organization In The Yeast Transcription Network. Nature Genetics, volume 31, pages 1-370. Nature Publishing Group, 2002.
[9] S. Bergmann, J. Ihmels, and N. Barkai:Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review, volume 67, pages 1-18. American Physical Society, 2003.
[10] M. Kloster, C. Tang, and N.S. Wingreen:Finding regulatory modules through large-scale gene-expression data analysis. Bioinformatics, volume 21, pages 1172-1179, 2005.
[11] Xiaowen Liu and Lusheng Wang: Computing the maximum similarity bi-clusters of gene expression data. bioinformatics, volume 23, pages 50- 56, 2007.
[12] R.Steuer and J.Kurths and C.O.Daub and J.Weiseand and J.Selbig:The mutual information:Detecting and evaluating depencies between variables. Bioinformatics,volume 18 Suppl 2, pages S231-S240, 2002.
[13] O.Maimon I. Priness and I.Ben-Gal: Evaluation of gene expression clustering via mutual information distance measure. BMC Bioinformatics, 2007.
[14] A.J. Butte and I.S. Kohane: Mutual Information relevance networks, Functional Genomic clustering using pairwise entropy measurements. PSB,volume 5, pages 415-426, 2000.
[15] X.Zhou, X. Wang, E.R. Dougherty, D.Russ and E. Suh: Gene Clustering Based on Clusterwide Mutual Information. Jouranl of Computational Biology, volume 11, Number 1, pages 147-161, 2004.
[16] N.SLonim, G.S.Atwal,G.Tkacik and W.Bialek: Information based clustering. PNAS, volume 102, pages 18297-18302, 2005.
[17] MacQueen: Some methods for classification and analysis of multivariate observations. Proc.of the 5th Berkeley Symposium om Mathematical Statistics and Probability, 1965.
[18] T. Kohonen: Self Organizing Maps, Springer, 1997.
[19] R. Sharan and R. Shamir: CLICK: a clustering algorithm with applications to gene expression analysis Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 2000.
[20] N. Slonim: The information Bottleneck: Theory and applications. In Ph.D. Thesis Tel Aviv University, Computer Science Department, 2002.
[21] Neelima Gupta and Seema Aggarwal: MIB: Using Mutual Information for Biclustering .
[22] G.S.Michaels,D.B.Carr,M.Askenazi,S.Fuhrman,X.Wen and R.Somogyi: Cluster Analysis and Data Visualization of Large Scale Gene Expression Data. PSB, volume 3, pages 42-53, 1998.
[23] S.Mallela I.S.Dhillon and D.S.Modha: Information Theoritic COClustering. Proceedings of SIGKDD-03, 2003.
[24] N.Slonim and N.Tishby: Document clustering using word clusters via the information bottleneck method. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.208-215, July 24-28, 2000, Athens, Greece
[25] Saccharomyces Genome Database, http://www.yeastgenome.org/.
[26] Funcassociate: The Gene Set Functionator, http://llama.med.harvard.edu/cgi/func1/funcassociate.
[27] Simon Haykin.: Neural Networks-A comprehensive Foundation 2nd Ed., Prentice Hall of India, 2007.