Automatic Clustering of Gene Ontology by Genetic Algorithm
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Automatic Clustering of Gene Ontology by Genetic Algorithm

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Zalmiyah Zakaria, Saberi M. Mohamad

Abstract:

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.

Keywords: Automatic clustering, cohesion-and-coupling metric, gene ontology; genetic algorithm, split-and-merge algorithm.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333230

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958

References:


[1] The Gene Ontology Consortium, "Gene ontology: tool for the unification of biology," Nat. Genet., vol. 25, no. 1, pp. 25-29, May 2000.
[2] M. Kaneko, M. Miki, and T. Hiroyasu, "A parallel genetic algorithm with distributed environment scheme," in Proc. 4th. Int. Conf. Parallel & Distributed Processing Techniques & Applications, Las Vegas, NV, 2000, pp. 619-625.
[3] S. Dutt and W. Deng, "Probability-based approaches to VLSI circuit partitioning," IEEE Trans. Computer-Aided Design of Integrated Circuits & Systems, vol. 19, no. 5, pp. 534-549, May 2000.
[4] C. Walshaw and M. Cross, "Parallel optimisation algorithms for multilevel mesh partitioning," Parallel Computing, vol. 26, no. 12, pp. 1635-1660, Nov. 2000.
[5] J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Trans. Pattern Analysis & Machine Intelligence, vol. 22, no. 8, pp. 888- 905, Aug. 2000.
[6] G. Getz, H. Gal, I. Kela, D.A. Notterman, and E. Domany, "Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data," Bioinformatics, vol. 19, no. 9, pp. 1079-1089, Jun. 2003.
[7] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, "An efficient k-means clustering algorithm: analysis and implementation," IEEE Trans. Pattern Analysis & Machine Intelligence, vol. 24, no. 7, pp. 881-892, Jul. 2002.
[8] C.H. Cheng, W.K. Lee, and K.F. Wong, "A genetic algorithm-based clustering approach for database partitioning," IEEE Trans. Systems, Man, & Cybernetics, vol. 32, no. 3, pp. 215-230, Aug. 2002.
[9] S. G├╝nter and H. Bunke, "Self-organizing map for clustering in the graph domain," Pattern Recognition Letters, vol. 23, no. 4, pp. 405-417, Feb. 2002.
[10] R.J. Hathaway and J.C. Bezdek, "Fuzzy c-means clustering of incomplete data," IEEE Trans. Systems, Man, & Cybernetics, vol. 31, no. 5, pp. 735-744, Oct. 2001.
[11] C.Y. Chen and F. Ye, "Particle swarm optimization algorithm and its application to clustering analysis," in Proc. 1st. Int. Conf. Networking, Sensing, & Control, Taipei, Taiwan, 2004, pp. 789-794.
[12] A.K. Jain, M.N. Murty, and P.J. Flynn, "Data clustering: a review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[13] P. Berkhin, Survey of Clustering Data Mining Techniques, Accrue Software Inc., San Jose, CA, 2002.
[14] S. Kotsiantis and P. Pintelas, "Recent advances in clustering: a brief survey," WSEAS Trans. Information Science & Applications, vol. 1, no. 1, pp. 73-81, Jul. 2004.
[15] D. Pelleg and A. Moore, "X-means: extending k-means with efficient estimation of the number of clusters," in Proc. 17th. Int. Conf. Machine Learning, Stanford, CA, 2000, pp. 727-734.
[16] G. Hamerly and C. Elkan. (2003, Dec.). Learning the k in k-means. in Proc. 17th. Conf. Neural Information Processing Systems. Vancouver, Canada. Available: http://books.nips.cc/nips16.html.
[17] L.Y. Tseng and S.B. Yang, "A genetic approach to the automatic clustering problem," Pattern Recognition, vol. 34, no. 2, pp. 415-424, Feb. 2001.
[18] G. Garai and B.B. Chaudhuri, "A novel genetic algorithm for automatic clustering," Pattern Recognition Letters, vol. 25, no. 2, pp. 173-187, Jan. 2004.
[19] R.J. Kuo, K. Chang, and S.Y. Chien, "Integration of self-organizing feature maps and genetic-algorithm-based clustering method for market segmentation," J. Organizational Computing & Electronic Commerce, vol. 14, no. 1, pp. 43-60, Jan. 2004.
[20] C. Walshaw and M. Cross, "Mesh partitioning: a multilevel balancing and refinement algorithm," SIAM J. Scientific Computing, vol. 22, no. 1, pp. 63-80, Jan. 2000.
[21] S.J. D-Amico, S.J. Wang, R. Batta, and C.M. Rump, "A simulated annealing approach to police district design," Computers & Operations Research, vol. 29, no. 6, pp. 667-684, May 2002.
[22] Y.G. Saab, "An effective multilevel algorithm for bisecting graphs and hypergraphs," IEEE Trans. Computers, vol. 53, no. 6, pp. 641-652, Jun. 2004.
[23] G. Wolfe, J.L. Wong, and M. Potkonjak, "Watermarking graph partitioning solutions," IEEE Trans. Computer-Aided Design of Integrated Circuits & Systems, vol. 21, no. 10, pp. 1196-1204, Oct. 2002.
[24] U. Elsner, "Graph partitioning: a survey," Technische Universitat Chemnitz, Chemnitz, Germany, Tech. Rep. 393, Dec. 1997.
[25] P.O. Fj├ñllström. (1998, Sep.). Algorithms for graph partitioning: a survey. Linköping Electronic Articles Computer & Information Science. 3(10). Available: http://www. ep.liu.se/ea/cis/1998/010.
[26] T.N. Bui and B.R. Moon, "Genetic algorithm and graph partitioning," IEEE Trans. Computers, vol. 45, no. 7, pp. 841-855, Jul. 1996.
[27] A. Kaveh and H.A.R. Bondarabady, "A hybrid graph-genetic method for domain decomposition," Finite Elements in Analysis & Design, vol. 39, no. 13, pp. 1237-1247, Oct. 2003.
[28] K. Kohmoto, K. Katayama, and H. Narihisa, "Performance of a genetic algorithm for the graph partitioning problem," Mathematical & Computer Modelling, vol. 38, no. 11-13, pp. 1325-1332, Dec. 2003.
[29] H. Stuckenschmidt and M. Klein, "Structure-based partitioning of large concept hierarchies," in Proc. 3rd. Int. Conf. Semantic Web, Hiroshima, Japan, 2004, pp. 289-303.
[30] M. Wall. (1996, Aug.). GAlib: a C++ library of genetic algorithm components. Available: http://lancet.mit.edu/ga/dist/galibdoc.pdf.
[31] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "A high-performance, portable implementation of the MPI message-passing interface standard," Parallel Computing, vol. 22, no. 6, pp. 789-828, Sep. 1996.